prime_data/guide/
mod.rs

1/*!
2
3# Guide
4
5This guide explains and gives more details about this crate and its items.
6
7**Hint**: Start with the [Introduction](self::introduction)
8
9## Map/Summary:
10
11* [**Introduction**](self::introduction)
12    1. [Background](self::introduction::_1_background)
13    2. [Prime Sieves](self::introduction::_2_prime_sieves)
14    3. [Memory Efficiency](self::introduction::_3_memory_efficiency)
15    4. [Time Efficiency](self::introduction::_4_time_efficiency)
16* [**Data Structure**](self::data_structure)
17    1. [Prime Bytes](self::data_structure::_1_prime_byte)
18    2. [Prime Data](self::data_structure::_2_prime_data)
19* [**Iterators**](self::iterators)
20    1. [30-Coprime Numbers](self::iterators::_1_coprime);
21    2. [Prime Numbers](self::iterators::_2_prime);
22* [**Sieving**](self::sieving)
23    1. [Expansion](self::sieving::_1_expansion)
24* [**Estimates**](self::estimates)
25    1. [Pi(n)](self::estimates::_1_bounds)
26    2. [p(n)](self::estimates::_2_nth_prime)
27* [**Plans for the Future**](self::future)
28*/
29
30
31/// # Introduction
32/// 
33/// This covers the basic information you need to understand the what/why/how of this library.
34pub mod introduction {
35    /// # Background
36    /// 
37    /// **Prime Numbers** are positive integers that are only divisible by 1 or themselves, excluding
38    /// 1 itself. Some of the first prime numbers are {2, 3, 5, 7, ...}.
39    /// 
40    /// We'll refer to them as prime numbers, or primes.
41    /// 
42    /// Prime numbers tend to get sparser the bigger they are. 25% of numbers below 100 are prime, but
43    /// when we go to 10 thousand, it becomes 12.29%.
44    /// 
45    /// Another important fact is that there isn't a "prime generating function" that can give you all
46    /// prime numbers in some constant time complexity. The only "safe way" for us to generate every
47    /// prime we want is to list them out as we go. But how?
48    pub mod _1_background {}
49
50    /// # Prime Sieves
51    /// 
52    /// One simple way of generating all prime numbers from 1 to N, is to first start with all the numbers
53    /// from 2 to N (because 1 is not prime by definition), and for each one, verify if their only divisors
54    /// are 1 and themselves. That is, for each N, verify that it is not divisible by any M where 1 < M < N.
55    /// 
56    /// However, that is incredibly inefficient. That's because, if N is not divisible by 2, it can't be
57    /// divisible by 4, 6, or any multiple of 2. Same goes for 3, 5, and the other primes. So in order for us
58    /// to see if N is prime, we should only look for primes below N.
59    /// 
60    /// Secondly, let's say there is an M between 1 and N that divides N. That means N is **composite**, or
61    /// also called, "nonprime". Therefore, N/M = K, where K is some integer. Now let's refer to √N. By the
62    /// definition of square roots, √N×√N = N = M×K. Therefore, either M or K must be less than or equal to √N.
63    /// Otherwise, their product would be greater than N.
64    /// 
65    /// Therefore, when we want to know if N is prime, we only need to look for primes P, where 2 < P ≤ √N.
66    /// 
67    /// Unfortunately, that is still inefficient for generating lists of primes. It's a perfectly valid and
68    /// good solution if we only care about one N, but if we want to list out a load of them, not really.
69    /// 
70    /// # The Sieve of Eratosthenes
71    /// 
72    /// Okay, can I please pause the whole formality of this documentation for a sec? Eratosthenes was a greek
73    /// dude that was born like 2300 years ago, and his algorithm for generating prime number lists is *the*
74    /// most common way we do it today. Whaaaaaaaa...
75    /// 
76    /// *Cough*. Anyway. Eratosthenes designed a simple algorithm, that starts out with all numbers from 2 to N.
77    /// 
78    /// ```text
79    ///  2  3  4  5  6  ...
80    /// [ ][ ][ ][ ][ ]
81    /// ```
82    /// 
83    /// We start with 2, and mark it as a prime number.
84    /// 
85    /// ```text
86    ///  2  3  4  5  6  ...
87    /// [√][ ][ ][ ][ ]
88    /// ```
89    /// 
90    /// From there, we keep adding two, crossing off everything we find in the way.
91    /// 
92    /// ```text
93    ///  2  3  4  5  6  ...
94    /// [√][ ][X][ ][X]
95    /// ```
96    /// 
97    /// Then we keep going. We mark 3 as prime, and from there, keep adding 3. If we stumble upon anything
98    /// that wasn't crossed off, we cross it off.
99    /// 
100    /// If we do this until we reach √N, everything that is not crossed off will be a prime number! This is
101    /// because, for every prime number, we crossed anything divisible by that prime number. And since the
102    /// only candidates for dividing N are at most √N, we know everything up to N that wasn't crossed off
103    /// is prime.
104    pub mod _2_prime_sieves {}
105
106    /// # Memory Efficiency
107    /// 
108    /// As we discussed previously, primes get sparser over time. So if we start with 10 thousand primes,
109    /// we know after this whole process, we'll be left with only 1229 as prime. That is, more than 80%
110    /// of the memory we created went in vain.
111    /// 
112    /// However, there's another lingering problem that we haven't tackled. What do we do with our prime
113    /// sieve after we crossed off all nonprimes? We need to store those primes somewhere.
114    /// 
115    /// # What's the most efficient way to store raw prime numbers?
116    /// 
117    /// That's a really vague question, which includes lots of factors. One of them will always arise from
118    /// any question regarding "memory efficiency", which is the most efficient way to store any
119    /// information. That involves compression, which will be dealt with later on. Right now, we only
120    /// care about listing prime numbers.
121    /// 
122    /// We can also optimize data storage by, instead of storing the data itself, we store a function
123    /// that can generate the data. That will also be dealt with in the future. Right now, we only care
124    /// about storing the data itself.
125    /// 
126    /// The most straightforward way is to list the primes, perhaps as a [`u32`], and simply store that
127    /// in a file. It's easy to encode and decode. Unfortunately, if we decided to store the first 200
128    /// million primes, the file size will be ~800MB. 
129    /// 
130    /// The first optimization idea could be to notice that the first of these primes are small, and don't
131    /// need to be stored as a 4 byte number. Maybe 1 byte or 2 bytes suffice. The first 54 primes can
132    /// be stored in 1 byte, then the next 6488 can be stored in 2 bytes. We can also encode the next
133    /// 1,071,329 primes in 3 bytes, leaving the remaining 198,928,671 to be stored in 4 bytes.
134    /// Unfortunately, that only compresses ~800MB into ~775MB.
135    /// 
136    /// ## The Law of Excluded Middle
137    /// 
138    /// If you've ever studied logic, you may have come across that law. It states something really
139    /// intuitive, but incredibly powerful for mathematical theorems:
140    /// 
141    /// > Every statement is either true or false. Never both, never neither.
142    /// 
143    /// With this, we can determine that every integer is either prime or not. Because every time we
144    /// say "N is prime", that statement is either true or false. Within the context of programming,
145    /// we can associate the state of being prime or not as a 1 or a 0.
146    /// 
147    /// This way, we can make a huge array (or, in a more dynamic context, a vector) of bits, where
148    /// the position `i` will store wether the number `i` is prime or not.
149    /// 
150    /// ```text
151    /// 0 1 2 3 4 5 6 7 8 9 ...
152    /// 0 0 1 1 0 1 0 1 0 0 ...
153    /// ```
154    /// 
155    /// In this context, it's a little harder to ask "how much memory it takes to store the first N primes"
156    /// because now we're storing according to all numbers, not just primes. So we need to first ask
157    /// *when* will we reach the Nth prime, then divide that by 8 to get the amount of bytes it'll take.
158    /// 
159    /// The 200 millionth prime is around 4.2 billion. Which means, in order to reach there, we'll need 4.2
160    /// billion bits, which is ~525MB. Already a big improvement.
161    /// 
162    /// ## We're not done!
163    /// 
164    /// Hopefully you already know that 2 is the only even prime. That means, after 2, every bit in an even
165    /// position will be guaranteed to be a zero. That's a lot of useless information! Specifically, close to
166    /// half of our data is unnecessary! So we can heavily improve our system by only listing odd primes,
167    /// and when we want to decode our data, we hardcode an extra 2 that will be missing!
168    /// 
169    /// ```text
170    /// 1 3 5 7 9 11 13 ...
171    /// 0 1 1 1 0 1  1  ...
172    /// ```
173    /// 
174    /// However, we can *generalize* this idea. Obviously, 3 is the only prime divisible by 3. So after 3,
175    /// all positions divisible by 3 (9, 15, 29769...) will also be a guaranteed zero!
176    /// 
177    /// So we can ignore all positions that aren't divisible by either 2 or 3! First, we multiply these two 
178    /// to get 6. Now, we notice that for any number N, `N % 6 == k`, where k is in the range `(0..6)`. If
179    /// k is 0, 2, or 4, N is definitely divisible by 2. If it's 3, N is definitely divisible by 3. That means 
180    /// the only possible primes, after 3, are the numbers where k results in 1 or 5.
181    ///  
182    /// ```text
183    /// 1 5 7 11 13 17 19 ...
184    /// 0 1 1 1  1  1  1  ...
185    /// ```
186    ///  
187    /// How much are we improving here? Well, in the beginning, we were storing every number as a bit. Which
188    /// means the bit to number ratio was 100%. When we excluded even numbers, it became `50%`, because it
189    /// takes 1 bit to store info about two numbers (since the second one is even, we know it's not prime).
190    /// Once we excluded numbers divisible by 2 and 3, we went from 6 possible values of k to just 2. Meaning
191    /// the bit to number ratio became ~33%.
192    ///  
193    /// And we can keep going. Adding 5 into the mix gives us a multiple of 30 (2*3*5). Out of the 30 possible
194    /// values of `N % 30`, 15 are even, 5 are odds multiple of 3, and other 2 are only multiples of 5. So we
195    /// have 8 possible prime candidates out of a 30 number range. ~26%.
196    /// 
197    /// However, as the range increases, we're increasing the size of the chunks of bits we need to decode.
198    /// For the previous example, we had to encode it in chunks of 2. One for k = 1, and the other for k = 5.
199    /// 
200    /// By adding 5 and increasing the range to 30, we have 8 possible primes, meaning the chunk size is now
201    /// an entire byte. As we increase the amount of primes to discard, we also increase the chunk size.
202    /// 
203    /// | Primes  | Range | bit/N | Chunk Size |
204    /// |---------|-------|-------|------------|
205    /// |  None   |  1    | 100%  | -          |
206    /// | 2       |  2    |  50%  | 1 bit      |
207    /// | 2,3     |  6    |  33%  | 2 bit      |
208    /// | 2,3,5   |  30   |  26%  | 8 bits     |
209    /// | 2,3,5,7 |  210  |  23%  | 48 bits    |
210    /// 
211    /// As you can see, when we add 7, the chunk size becomes 6 times larger, while we go from 26% to 23%.
212    /// We want to avoid big changes for small improvements. Not only that, but using just 2, 3, and 5 lets
213    /// us have exactly 1 byte of chunk size. Neat and tidy.
214    /// 
215    /// ## Recap
216    /// 
217    /// How will our data look like? We know that every byte is a chunk of prime candidates, that are either
218    /// prime (1) or not prime (0). Each bit represents a number N, so that when you evaluate `N % 30`, it'll
219    /// return some `k` that is not divisible by 2, 3 or 5. Specifically, these k values:
220    /// 
221    /// `[ 01 07 11 13 17 19 23 29 ]`
222    /// 
223    /// A byte by itself doesn't give us the full picture though. We can't extract N from k. So if we want
224    /// to find out the numbers in a given chunk, we need some "offset" value, such as 29. Then, we can
225    /// multiply the offset by 30, and we can sum each k value to that result, yielding N. Example:
226    /// 
227    /// `{ offset: 29, byte: [01111000] }`
228    /// 
229    /// First, we multiply the offset. `result = offset * 30 = 870`. Then, we notice the k values that are
230    /// ones (a.k.a, prime numbers) are 7, 11, 13, and 17. Summing those by 870 gives us the numbers:
231    /// 
232    /// `877, 881, 883, 887`
233    /// 
234    /// If our data isn't wrong, we officially decoded all prime numbers between 870 and 900!
235    pub mod _3_memory_efficiency {}
236
237    /// # Time Efficiency
238    /// 
239    /// A sieving algorithm for this data structure will consider "crossing it off as nonprime" be
240    /// flipping the 1 into a 0. That is, the starter data will contain all ones, and we'll cross them
241    /// off as nonprime by flipping them to zero.
242    /// 
243    /// With the Sieve of Eratosthenes, when we encountered a prime (2, for example), we then kept
244    /// adding it by itself and crossing off those results as nonprime. (4, 6, 8, ...).
245    /// 
246    /// However, this data that we're defining does not store numbers divisible by 2, 3, or 5. That is,
247    /// if we find some prime P, it makes no sense to "flip the 2P bit" because there is no bit in our
248    /// data structure that represents 2P because 2P is divisible by 2. If we add P to that again, we
249    /// get 3P, which also doesn't have a bit to it because it's divisible by 3.
250    /// 
251    /// The only numbers that have a bit matching their value is numbers of the form XP, where X is
252    /// not divisible by 2, 3, or 5 (saying this over and over is getting repetitive. From now on,
253    /// we'll call those "coprime with 30", or simply "30-coprime").
254    /// 
255    /// If we only iterate over 30-coprime numbers, multiply them by P, and flipping the bit matching
256    /// the result to 0, we're making things 3.75x faster (30/8).
257    pub mod _4_time_efficiency {}
258}
259
260/// # Data Structure
261/// 
262/// As discussed in the introduction, we need to develop a special data structure for efficiently storing
263/// and generating prime numbers. We need to first, create an abstraction of those "bytes" that contain the
264/// 8 30-coprime k-values that we mentioned. Then, our data structure will be a list (vector) of those
265/// bytes.
266pub mod data_structure {
267    /// # Prime Bytes
268    /// 
269    /// A bit is a 1 or a 0. A byte is a list of 8 bits. Example: `[01101001]`
270    /// 
271    /// A "prime byte" is what we'll name our abstraction over k-values. Reiterating, each bit in the 
272    /// prime byte will correspond to the k-value:
273    /// 
274    /// `[ 01 07 11 13 17 19 23 29 ]`
275    /// 
276    /// So, if we want to retrieve "the bit that corresponds to the k-value 13", that's the same as asking
277    /// "please give me the 4th bit". We can do this by performing a right shift, then taking the result
278    /// modulo 2. For instance, if we want to retrieve the third bit, we remove the last 5 bits
279    /// by shifting the byte to the right by 5. Then, we take the result modulo 2 to ignore the first 2
280    /// bits.
281    /// 
282    /// ```
283    /// // the third bit  >|<  is 1
284    /// let byte: u8 = 0b01101010;
285    /// assert_eq!((byte >> 5) % 2, 1);
286    /// ```
287    /// 
288    /// What if we want to extract the k values of a prime byte that are set to 1? In the previous
289    /// example, we had `01101010`. If we map each bit to a boolean array, where 0 maps to false and 1
290    /// maps to true, we can then zip it to the array of k values, yielding the following tuples:
291    /// 
292    /// `[ (false, 1), (true, 7), (true, 11), ... (false, 19), (true, 23), (false, 29) ]`
293    /// 
294    /// We can then filter all tuples with false, and extract the k value out of the remaining tuples.
295    /// The resulting list, contains all k-values that are prime, according to the byte. Of course, we can
296    /// only get their true value by determining the byte's offset.
297    /// 
298    /// There are [other methods](crate::PrimeByte) implemented in this library, which are all tweaked
299    /// versions of what I showed above. For example, maybe we want to restrict to prime bytes where the
300    /// k-values belong to some range. That's easily done by applying a second condition to the filter which
301    /// is that the given range contains the k-value.
302    /// 
303    /// ## Visibility
304    /// 
305    /// I decided to keep Prime Bytes public, in case anyone else feels like developing a data structure
306    /// using prime bytes. But if not, they can just simply ignore it.
307    pub mod _1_prime_byte {}
308
309    /// # Prime Data
310    /// 
311    /// Onto the actual data structure for storing prime numbers.
312    /// 
313    /// The structure contains two fields. `data` and `range`.
314    /// 
315    /// ## `data: Vec<PrimeByte>`
316    /// 
317    /// The data is straightforward. It's simply a vector of [prime bytes](crate::data::PrimeByte). However,
318    /// let's say we want to store the primes up to 1 million. We can't exactly divide that into our data
319    /// because `1_000_000 % 30 = 10`. So we'll end up using 33_334 bytes, but that last byte will not
320    /// contain the full information about primes in the range (999_990, 1_000_020). 
321    /// 
322    /// Therefore, this data would produce undefined behaviour if we tried to retrieve the primes in that
323    /// last byte of data. In order for us to rigorously make sure to return an error when it happens,
324    /// we need to store a range.
325    /// 
326    /// ## `range: RangeInclusive<u64>`
327    /// 
328    /// An inclusive range will determine what are the exact bounds of the given data. For example, if
329    /// we had only one byte of data, and the range was (33..=58), we can infer that the byte of data
330    /// represents prime candidates from (30..60), however, the first bit is not part of the data, as it
331    /// represents 31. Same goes for the last bit of data, which represents 59. Trying to access those
332    /// bits should result in an error.
333    /// 
334    /// The range start is especially important because it also determines the byte offset we discussed
335    /// in the introduction. The offset is basically the range start divided by 30. If the range starts
336    /// at 0, 20, or anything below 30, the first byte corresponds to (0..30). As soon as the range starts
337    /// at 30, this means the first byte corresponds to (30..60) instead.
338    /// 
339    /// The range will also be an important factor for expanding data. But in order to expand data, we need
340    /// to understand how to [iterate over them](crate::guide::iterators).
341    pub mod _2_prime_data {}
342}
343
344/// # Iterators
345pub mod iterators {
346    /// # Iterating over 30-Coprime Numbers
347    /// 
348    /// As discussed in the introduction, if we want to create a prime sieve using our data structure, we
349    /// need to iterate over numbers coprime with 30, to multiply them by some prime, and retrieve the
350    /// bit that corresponds to that result to flip it to zero.
351    /// 
352    /// This is quite straightforward to do. We start with some offset and some value between 0 and 7,
353    /// which is a "bit position" that corresponds to a k-value. Calling `next()` will increase that
354    /// bit by one (yielding the next k-value), or if that goes above 7, we reset it to 0 and increase
355    /// the offset by 1. We stop once `30*offset + k` is greater than some bound.
356    /// 
357    /// If you need more intricate details, you can see my implementation [here](crate::CoprimeIter).
358    pub mod _1_coprime {}
359
360    /// # Iterating over Prime Numbers
361    /// 
362    /// The prime sieve requires us to iterate over primes, so that we then create a nested iterator
363    /// that will iterate over 30-coprime numbers. Those primes will come from an already known data.
364    /// 
365    /// This is much trickier than simply iterating over 30-Coprimes so I'll include the details of
366    /// my implementation below.
367    /// 
368    /// The struct's fields are as follows:
369    /// 
370    /// ```no_run
371    /// # use prime_data::PrimeByte;
372    /// struct PrimeIter<'a> {
373    ///     data: &'a [PrimeByte],
374    ///     primes: Option<Vec<u64>>,
375    ///     current: (u64, usize),
376    ///     data_offset: u64,
377    ///     stop_at: u64,
378    /// }    
379    /// ```
380    /// 
381    /// Let's say we have some prime data. It has 7 [bytes](crate::PrimeByte) of data, with the range
382    /// `(73..=262)`. This tells us the data offset is 2, because `73 / 30 = 2`. Technically, the data
383    /// bytes range over (60..270), but the actual range tells us we cannot verify if 61, 67, 71, 263,
384    /// or 269 are primes or not. Even though their bits are in the data, they're not valid bits because
385    /// they fall out of the verified range.
386    /// 
387    /// If we try to create an [iterator](crate::data::PrimeIter) of primes in the range `(72..=260)`,
388    /// it will send out an error, since that range is not contained in our original data's range. Even
389    /// if it's trivially known that 72 is not prime, and is the only number out of the range. However,
390    /// I could change that in the [future](super::future).
391    /// 
392    /// Let's say we wanted to iterate over primes in the range `(101..=240)` instead. That range is
393    /// contained in the original range, so it'll not return an error. From that range, we can already
394    /// determine `stop_at = 240`. 
395    /// 
396    /// We don't have to take a slice reference to the entire data. The first byte is the range (60..90),
397    /// so we can start the iterator on the byte after, because 101 is contained in the next byte.
398    /// Same applies for the ending. The 7th byte contains the range (240..270). However, since 240 is
399    /// directly in between two bytes (we can argue the 6th byte contains the range (210..=240)), we can
400    /// stop at the 6th byte. So we can determine `data = &prime_data[1..=6]`. If the range end was 241
401    /// instead, then we would need the slice end to be at 7, not 6.
402    /// 
403    /// Bytes by themselves, as we saw, cannot give you prime numbers. They need an offset. Same goes
404    /// for any slice of prime bytes. We need the data's offset. Since the first byte (`data[0]`) is
405    /// the range (90..120), we can determine the `data_offset = 90 / 30 = 3`
406    /// 
407    /// Now, we can retreive primes from the data by applying the offset. However, notice that we can't
408    /// simply [get all primes](crate::data::PrimeByte::as_primes) because the range starts at 101, not
409    /// 90. So we need to [restrict ourselves](crate::PrimeByte::as_primes_in_range) to the range
410    /// (101..=120). The result of that will be the field`primes`. The reason why it needs to be an
411    /// option is that there are two situations in which we won't be able to retrieve that vector.
412    /// 
413    /// 1. If the range has no primes, the given vector will be empty. This means if we try to access
414    /// `self.primes[self.current.1]`, that will panic. Therefore, as a safety measure, we store it
415    /// as an option and set it to None when empty.
416    /// 
417    /// 2. If we're iterating through the data, we could encounter a prime byte that has no primes.
418    /// This means, when we iterate through the data, we need to keep incrememting `current.0` until
419    /// we hit a prime byte that is non empty. However, there's a chance we increase it to the point where
420    /// we hit the end of our slice. We return it as None and move on.
421    /// 
422    /// In both situations, that's equivalent of "ending the iterator". From that, we will always
423    /// yield None.
424    ///
425    /// # Retrieving `next()`
426    /// 
427    /// Let's say we're at some step of the iterator. When calling the next function, the first thing we
428    /// do is check if `self.primes` is [None](Option::None) and return that if so. Otherwise, we retrieve
429    /// a prime number using `primes[self.current.1]`. This will never panic because we won't allow the
430    /// iterator to reach the given vector's length.
431    /// 
432    /// If not, we gotta look for the next prime. First, we check if we hit the ending of the current
433    /// vector. If we did, we'll increment `self.current.0` to reach out for the next piece of data.
434    /// If that is equal to the data's length, it means we hit the actual end of the whole data. That's
435    /// where we set `self.primes = None` and return the last prime number we got previously.
436    /// 
437    /// Of course, if we didn't hit the vector's length, we just increment the index and move on. If we
438    /// did, but we didn't hit the data's length, we'll just grab the next byte and retrieve its
439    /// prime numbers using [as_primes](crate::PrimeByte::as_primes). The offset that should be given
440    /// as a parameter is `self.data_offset + self.current.0`.
441    pub mod _2_prime {}
442}
443
444/// # Sieving
445/// 
446/// Sieving requires some already know data. Hence, PrimeData has a handy method which is a hard-coded
447/// piece of data with all primes from 0 to 30. Can you guess how the prime byte looks like? If you need,
448/// refer to the [k-values](crate::K_VALUES)
449/// 
450/// Answer: `[01111111]` (1 isn't prime)
451pub mod sieving {
452        /// # Sieving = Expansion
453        /// 
454        /// Data Expansion is essentially, grabbing an initial piece of data, generating a new piece with a wider range,
455        /// then sieving it to filter out nonprime numbers, just like an Eratosthenes sieve. However, there's a very
456        /// intricate reason I chose to call this an "expansion": Because it sounds cool 😎.
457        /// 
458        /// Anyway, if we have some piece of data and we want to expand it to the prime numbers from 30 thousand to 50
459        /// thousand, we first need to check if the original piece of data can expand that much. `isqrt(50_000) = 223`,
460        /// so as long as the original data goes from 1 to 223 or more, we're set.
461        /// 
462        /// If so, the first thing we do is create a starter data. The range is, naturally, `30_000..=50_000`, the data
463        /// size will be `ceil(50_000 / 30) - floor(30_000 / 30) = 667` and the entire data is filled with ones.
464        /// 
465        /// The next step is so iterate over prime numbers from 1 to 223 (I mean, we can start anywhere from 0 to 7,
466        /// same results). We know how to do that. Each iteration yields a prime number **p**. For each **p**, we need
467        /// to calculate the biggest number, that when multiplied by it, will be less than or equal to 50 thousand.
468        /// If we consider the first prime in the iteration, 7, that is `floor(50_000 / 7) = 7142`.
469        /// 
470        /// From that, we can now iterate over numbers coprime with 30 (not just primes) that are between 1 and 7142
471        /// (again, it doesn't have to be 1, anything from 0 to 7 works). Each iteration will yield a **c**.
472        /// 
473        /// Then, all we have to do is calculate **pc** and the result of that multiplication is not a prime number!
474        /// We just need to use some handy `data.set_nonprime(p*c)` function to flip the bit to zero. For a
475        /// safer user experience, that function is private for PrimeData, but it [is](crate::PrimeByte::set_nonprime)
476        /// public for prime bytes, if you wish to create your own data struct using those.
477        ///
478        /// That's it. That's all there is to it. Once we iterate over all **p** values, the expanded data is a valid
479        /// data of prime numbers from 30 thousand to 50 thousand!
480        /// 
481        /// Of course, as previously said, we cannot *join* those two bits of data together unless the original data
482        /// ranges to 30 thousand. But since the iteration process does not consume data, just references it, we can
483        /// expand to as many new PrimeData as we want, and join them all together once their starts and ends match.
484        /// Joining two PrimeData into one has not yet been implemented, but [is planned to](super::future).
485    pub mod _1_expansion {}
486}
487
488/// # Estimates
489/// 
490/// As discussed in the introduction, our only hope to safely generate and count prime numbers is by
491/// sieving-like methods. However, sometimes, a good approximation is all we need.
492pub mod estimates {
493    /// # Approximating π(x)
494    /// 
495    /// Don't be fooled! That "π" represents a function, not the famous constant.
496    /// 
497    /// Mathematicians refer to π(N) as "how many primes, less than or equal to N, are there?".
498    /// 
499    /// We can generate some prime data from 1 to N and count its primes. But that takes a good amount
500    /// of time if N is big enough. If we want to approximate it (we'll see later why this is useful),
501    /// we can use a very interesting theorem.
502    ///
503    /// > π(x) ~ x / ln(x)
504    /// 
505    /// This is known as the Prime Number Theorem, and unfortunately, even the most elementary proof for
506    /// this is quite the hassle for this guide. But, if you're interested, you can give it a try. Heads
507    /// up, you'll find yourself reading about complex analysis (including the Riemann Zeta Function),
508    /// some crazy logarithmic sums, perhaps even integrals.
509    /// 
510    /// I'm also not gonna deeply explain what ln(x) is. It's the inverse of an exponential function.
511    /// Exponentials have the properties of growing incredibly quickly, and defined for all x values.
512    /// This means that its inverse, the logarithm, will grow incredibly **slowly**, but it never stops
513    /// growing. If it did, that would mean the logarithm stops at a certain bound. And that would mean
514    /// the exponential wasn't defined for anything above that bound.
515    /// 
516    /// That's kinda like prime numbers. The further we go in the number line, the less primes we'll find,
517    /// as they get sparser, but we'll never stop finding primes. So that theorem does make sense.
518    /// 
519    /// # Approximations
520    /// 
521    /// The problem here, is that ⌊ x / ln(x) ⌋ is not a great approximation for π(x). When x = 1000, the
522    /// approximation is off by 13%. If x = 1 billion, it's off by 5%.
523    /// 
524    /// Don't get me wrong, that's okay, but 5% of 1 billion still is still 50 million. I think we need
525    /// a better way to approximate things.
526    /// 
527    /// There's a very cool alternative, which is (x / ln(x)) × Σ ( n! / ln^n(x) )
528    /// 
529    /// That is, we multiply the original approximation by a sum. That sum, theoretically goes to infinity,
530    /// but since we want an approximation, we can stop at a bound B. So, for each n in the range (0..=B),
531    /// we evaluate ( n! / ln^n(x) ), then we sum everything.
532    /// 
533    /// Just to make sure everyone understands, n! refers to the factorial of n. And ln^n(x) refers to
534    /// `x.ln().powi(n)` in Rust code.
535    /// 
536    /// However, if we want to stop at a certain bound B, we should probably tweak those n values a bit.
537    /// Not the ones in ln^n(x), but the ones in n factorial.
538    /// 
539    /// Pierre Dusart, a french mathematician, found out that if we only take the range (0..2), it's
540    /// a good approximation to use the n values: (0, 1, 2.51).
541    /// 
542    /// This yields the function (x / ln(x)) × (1 + 1 / ln(x) + 2.51 / ln^2(x)). Spoiler alert, at
543    /// x = 1 billion, the approximation is off by 0.035%. Nice.
544    /// 
545    /// That function above is actually an upper bound for π(x) after ~356k. We can tweak the n values
546    /// to get a nice lower bound as well. I plan to implement that in the [future](super::future).
547    pub mod _1_bounds {}
548
549    /// # Approximating p(n)
550    /// 
551    /// p(n) refers to the "nth prime number". For example, as we discussed, the first prime numbers are
552    /// {2, 3, 5, 7, ...}. So p(1) = 2, p(2) = 3, p(4) = 7, and so on. p(0) is undefined. I know, I know,
553    /// within programming we commonly start at zero, but this is mathematician land. And that's how
554    /// mathematicians chose to do this.
555    /// 
556    /// Anyway, if we approximate π(1 billion) ~ 50865514, this means there are roughly 50 million primes
557    /// under 1 billion. If we listed them out in a vector, the last entry would be very close to 1 billion
558    /// and its index would close to 50865514. See what we're doing here?
559    /// 
560    /// We can flip the question around. We can instead say that the 50865514th prime is ~ 1 billion.
561    /// It's actually quite accurate, the real answer is 1,000,373,273.
562    /// 
563    /// However, I found myself fond of a [formula](https://en.wikipedia.org/wiki/Prime_number_theorem#Approximations_for_the_nth_prime_number)
564    /// I found on Wikipedia.
565    /// 
566    /// What's fun is that, with an approximation and a known error bound for that approximation, we can
567    /// create a lower and upper bound for where the nth prime number lies. Then, we can simply generate
568    /// data, count the primes from 1 to the lower bound, then look for nth prime between those bounds.
569    pub mod _2_nth_prime {}
570}
571
572/// # Plans for the Future
573/// 
574/// This section isn't really a "guide", and more of a list of things I plan to add in the future, to my
575/// library. 
576/// 
577/// * High Priority = Expect to see it in the next versions
578/// * Medium Priority = I will do it eventually.
579/// * Low Priority = I could do it, but I don't see why would I.
580/// 
581/// # High Priority
582/// 
583/// ## Faster `is_prime(x)`
584/// 
585/// Currently, this function creates prime data up to the square root of `x`, then checking if none of the
586/// generated primes divide x. Unfortunately, this is way slower than it needs to be. Primality tests
587/// like Fermat or, the one I intend to implement, the Miller-Rabin test, are way faster.
588/// 
589/// ## More Rigorous Tests
590/// 
591/// Currently I'm running integration tests to make sure the data generation is working properly.
592/// But most of the other methods PrimeData and PrimeByte have have not been thoroughly tested.
593/// 
594/// # Medium Priority
595/// 
596/// ## Add a `lower_bound()` method
597/// 
598/// Currently, the [estimate](crate::estimate) module only has an [`upper_bound`](crate::estimate::upper_bound)
599/// method.
600/// 
601/// ## Better approximatin function for π(x)
602/// 
603/// Currently, π(x) approximations are never off by more than 0.5%. They're actually super good for values
604/// greater than [`u32::MAX`], but below that, I'm probably better off using combinatorial methods.
605/// 
606/// # Low Priority
607/// 
608/// ## Smarter Structs
609/// 
610/// If your data ranges from 73 to 144, you can't retrieve primes from 72 to 144, even if it's obvious
611/// that 72 isn't prime. I could implement a check to see if the values that lie outside of the data range
612/// are "obviously prime" (by checking if they're not coprime with 30).
613pub mod future {}
prime_data/guide/mod.rs

prime_data/guide/
mod.rs