prime_data/guide/mod.rs
1/*!
2
3# Guide
4
5This guide explains and gives more details about this crate and its items.
6
7**Hint**: Start with the [Introduction](self::introduction)
8
9## Map/Summary:
10
11* [**Introduction**](self::introduction)
12 1. [Background](self::introduction::_1_background)
13 2. [Prime Sieves](self::introduction::_2_prime_sieves)
14 3. [Memory Efficiency](self::introduction::_3_memory_efficiency)
15 4. [Time Efficiency](self::introduction::_4_time_efficiency)
16* [**Data Structure**](self::data_structure)
17 1. [Prime Bytes](self::data_structure::_1_prime_byte)
18 2. [Prime Data](self::data_structure::_2_prime_data)
19* [**Iterators**](self::iterators)
20 1. [30-Coprime Numbers](self::iterators::_1_coprime);
21 2. [Prime Numbers](self::iterators::_2_prime);
22* [**Sieving**](self::sieving)
23 1. [Expansion](self::sieving::_1_expansion)
24* [**Estimates**](self::estimates)
25 1. [Pi(n)](self::estimates::_1_bounds)
26 2. [p(n)](self::estimates::_2_nth_prime)
27* [**Plans for the Future**](self::future)
28*/
29
30
31/// # Introduction
32///
33/// This covers the basic information you need to understand the what/why/how of this library.
34pub mod introduction {
35 /// # Background
36 ///
37 /// **Prime Numbers** are positive integers that are only divisible by 1 or themselves, excluding
38 /// 1 itself. Some of the first prime numbers are {2, 3, 5, 7, ...}.
39 ///
40 /// We'll refer to them as prime numbers, or primes.
41 ///
42 /// Prime numbers tend to get sparser the bigger they are. 25% of numbers below 100 are prime, but
43 /// when we go to 10 thousand, it becomes 12.29%.
44 ///
45 /// Another important fact is that there isn't a "prime generating function" that can give you all
46 /// prime numbers in some constant time complexity. The only "safe way" for us to generate every
47 /// prime we want is to list them out as we go. But how?
48 pub mod _1_background {}
49
50 /// # Prime Sieves
51 ///
52 /// One simple way of generating all prime numbers from 1 to N, is to first start with all the numbers
53 /// from 2 to N (because 1 is not prime by definition), and for each one, verify if their only divisors
54 /// are 1 and themselves. That is, for each N, verify that it is not divisible by any M where 1 < M < N.
55 ///
56 /// However, that is incredibly inefficient. That's because, if N is not divisible by 2, it can't be
57 /// divisible by 4, 6, or any multiple of 2. Same goes for 3, 5, and the other primes. So in order for us
58 /// to see if N is prime, we should only look for primes below N.
59 ///
60 /// Secondly, let's say there is an M between 1 and N that divides N. That means N is **composite**, or
61 /// also called, "nonprime". Therefore, N/M = K, where K is some integer. Now let's refer to √N. By the
62 /// definition of square roots, √N×√N = N = M×K. Therefore, either M or K must be less than or equal to √N.
63 /// Otherwise, their product would be greater than N.
64 ///
65 /// Therefore, when we want to know if N is prime, we only need to look for primes P, where 2 < P ≤ √N.
66 ///
67 /// Unfortunately, that is still inefficient for generating lists of primes. It's a perfectly valid and
68 /// good solution if we only care about one N, but if we want to list out a load of them, not really.
69 ///
70 /// # The Sieve of Eratosthenes
71 ///
72 /// Okay, can I please pause the whole formality of this documentation for a sec? Eratosthenes was a greek
73 /// dude that was born like 2300 years ago, and his algorithm for generating prime number lists is *the*
74 /// most common way we do it today. Whaaaaaaaa...
75 ///
76 /// *Cough*. Anyway. Eratosthenes designed a simple algorithm, that starts out with all numbers from 2 to N.
77 ///
78 /// ```text
79 /// 2 3 4 5 6 ...
80 /// [ ][ ][ ][ ][ ]
81 /// ```
82 ///
83 /// We start with 2, and mark it as a prime number.
84 ///
85 /// ```text
86 /// 2 3 4 5 6 ...
87 /// [√][ ][ ][ ][ ]
88 /// ```
89 ///
90 /// From there, we keep adding two, crossing off everything we find in the way.
91 ///
92 /// ```text
93 /// 2 3 4 5 6 ...
94 /// [√][ ][X][ ][X]
95 /// ```
96 ///
97 /// Then we keep going. We mark 3 as prime, and from there, keep adding 3. If we stumble upon anything
98 /// that wasn't crossed off, we cross it off.
99 ///
100 /// If we do this until we reach √N, everything that is not crossed off will be a prime number! This is
101 /// because, for every prime number, we crossed anything divisible by that prime number. And since the
102 /// only candidates for dividing N are at most √N, we know everything up to N that wasn't crossed off
103 /// is prime.
104 pub mod _2_prime_sieves {}
105
106 /// # Memory Efficiency
107 ///
108 /// As we discussed previously, primes get sparser over time. So if we start with 10 thousand primes,
109 /// we know after this whole process, we'll be left with only 1229 as prime. That is, more than 80%
110 /// of the memory we created went in vain.
111 ///
112 /// However, there's another lingering problem that we haven't tackled. What do we do with our prime
113 /// sieve after we crossed off all nonprimes? We need to store those primes somewhere.
114 ///
115 /// # What's the most efficient way to store raw prime numbers?
116 ///
117 /// That's a really vague question, which includes lots of factors. One of them will always arise from
118 /// any question regarding "memory efficiency", which is the most efficient way to store any
119 /// information. That involves compression, which will be dealt with later on. Right now, we only
120 /// care about listing prime numbers.
121 ///
122 /// We can also optimize data storage by, instead of storing the data itself, we store a function
123 /// that can generate the data. That will also be dealt with in the future. Right now, we only care
124 /// about storing the data itself.
125 ///
126 /// The most straightforward way is to list the primes, perhaps as a [`u32`], and simply store that
127 /// in a file. It's easy to encode and decode. Unfortunately, if we decided to store the first 200
128 /// million primes, the file size will be ~800MB.
129 ///
130 /// The first optimization idea could be to notice that the first of these primes are small, and don't
131 /// need to be stored as a 4 byte number. Maybe 1 byte or 2 bytes suffice. The first 54 primes can
132 /// be stored in 1 byte, then the next 6488 can be stored in 2 bytes. We can also encode the next
133 /// 1,071,329 primes in 3 bytes, leaving the remaining 198,928,671 to be stored in 4 bytes.
134 /// Unfortunately, that only compresses ~800MB into ~775MB.
135 ///
136 /// ## The Law of Excluded Middle
137 ///
138 /// If you've ever studied logic, you may have come across that law. It states something really
139 /// intuitive, but incredibly powerful for mathematical theorems:
140 ///
141 /// > Every statement is either true or false. Never both, never neither.
142 ///
143 /// With this, we can determine that every integer is either prime or not. Because every time we
144 /// say "N is prime", that statement is either true or false. Within the context of programming,
145 /// we can associate the state of being prime or not as a 1 or a 0.
146 ///
147 /// This way, we can make a huge array (or, in a more dynamic context, a vector) of bits, where
148 /// the position `i` will store wether the number `i` is prime or not.
149 ///
150 /// ```text
151 /// 0 1 2 3 4 5 6 7 8 9 ...
152 /// 0 0 1 1 0 1 0 1 0 0 ...
153 /// ```
154 ///
155 /// In this context, it's a little harder to ask "how much memory it takes to store the first N primes"
156 /// because now we're storing according to all numbers, not just primes. So we need to first ask
157 /// *when* will we reach the Nth prime, then divide that by 8 to get the amount of bytes it'll take.
158 ///
159 /// The 200 millionth prime is around 4.2 billion. Which means, in order to reach there, we'll need 4.2
160 /// billion bits, which is ~525MB. Already a big improvement.
161 ///
162 /// ## We're not done!
163 ///
164 /// Hopefully you already know that 2 is the only even prime. That means, after 2, every bit in an even
165 /// position will be guaranteed to be a zero. That's a lot of useless information! Specifically, close to
166 /// half of our data is unnecessary! So we can heavily improve our system by only listing odd primes,
167 /// and when we want to decode our data, we hardcode an extra 2 that will be missing!
168 ///
169 /// ```text
170 /// 1 3 5 7 9 11 13 ...
171 /// 0 1 1 1 0 1 1 ...
172 /// ```
173 ///
174 /// However, we can *generalize* this idea. Obviously, 3 is the only prime divisible by 3. So after 3,
175 /// all positions divisible by 3 (9, 15, 29769...) will also be a guaranteed zero!
176 ///
177 /// So we can ignore all positions that aren't divisible by either 2 or 3! First, we multiply these two
178 /// to get 6. Now, we notice that for any number N, `N % 6 == k`, where k is in the range `(0..6)`. If
179 /// k is 0, 2, or 4, N is definitely divisible by 2. If it's 3, N is definitely divisible by 3. That means
180 /// the only possible primes, after 3, are the numbers where k results in 1 or 5.
181 ///
182 /// ```text
183 /// 1 5 7 11 13 17 19 ...
184 /// 0 1 1 1 1 1 1 ...
185 /// ```
186 ///
187 /// How much are we improving here? Well, in the beginning, we were storing every number as a bit. Which
188 /// means the bit to number ratio was 100%. When we excluded even numbers, it became `50%`, because it
189 /// takes 1 bit to store info about two numbers (since the second one is even, we know it's not prime).
190 /// Once we excluded numbers divisible by 2 and 3, we went from 6 possible values of k to just 2. Meaning
191 /// the bit to number ratio became ~33%.
192 ///
193 /// And we can keep going. Adding 5 into the mix gives us a multiple of 30 (2*3*5). Out of the 30 possible
194 /// values of `N % 30`, 15 are even, 5 are odds multiple of 3, and other 2 are only multiples of 5. So we
195 /// have 8 possible prime candidates out of a 30 number range. ~26%.
196 ///
197 /// However, as the range increases, we're increasing the size of the chunks of bits we need to decode.
198 /// For the previous example, we had to encode it in chunks of 2. One for k = 1, and the other for k = 5.
199 ///
200 /// By adding 5 and increasing the range to 30, we have 8 possible primes, meaning the chunk size is now
201 /// an entire byte. As we increase the amount of primes to discard, we also increase the chunk size.
202 ///
203 /// | Primes | Range | bit/N | Chunk Size |
204 /// |---------|-------|-------|------------|
205 /// | None | 1 | 100% | - |
206 /// | 2 | 2 | 50% | 1 bit |
207 /// | 2,3 | 6 | 33% | 2 bit |
208 /// | 2,3,5 | 30 | 26% | 8 bits |
209 /// | 2,3,5,7 | 210 | 23% | 48 bits |
210 ///
211 /// As you can see, when we add 7, the chunk size becomes 6 times larger, while we go from 26% to 23%.
212 /// We want to avoid big changes for small improvements. Not only that, but using just 2, 3, and 5 lets
213 /// us have exactly 1 byte of chunk size. Neat and tidy.
214 ///
215 /// ## Recap
216 ///
217 /// How will our data look like? We know that every byte is a chunk of prime candidates, that are either
218 /// prime (1) or not prime (0). Each bit represents a number N, so that when you evaluate `N % 30`, it'll
219 /// return some `k` that is not divisible by 2, 3 or 5. Specifically, these k values:
220 ///
221 /// `[ 01 07 11 13 17 19 23 29 ]`
222 ///
223 /// A byte by itself doesn't give us the full picture though. We can't extract N from k. So if we want
224 /// to find out the numbers in a given chunk, we need some "offset" value, such as 29. Then, we can
225 /// multiply the offset by 30, and we can sum each k value to that result, yielding N. Example:
226 ///
227 /// `{ offset: 29, byte: [01111000] }`
228 ///
229 /// First, we multiply the offset. `result = offset * 30 = 870`. Then, we notice the k values that are
230 /// ones (a.k.a, prime numbers) are 7, 11, 13, and 17. Summing those by 870 gives us the numbers:
231 ///
232 /// `877, 881, 883, 887`
233 ///
234 /// If our data isn't wrong, we officially decoded all prime numbers between 870 and 900!
235 pub mod _3_memory_efficiency {}
236
237 /// # Time Efficiency
238 ///
239 /// A sieving algorithm for this data structure will consider "crossing it off as nonprime" be
240 /// flipping the 1 into a 0. That is, the starter data will contain all ones, and we'll cross them
241 /// off as nonprime by flipping them to zero.
242 ///
243 /// With the Sieve of Eratosthenes, when we encountered a prime (2, for example), we then kept
244 /// adding it by itself and crossing off those results as nonprime. (4, 6, 8, ...).
245 ///
246 /// However, this data that we're defining does not store numbers divisible by 2, 3, or 5. That is,
247 /// if we find some prime P, it makes no sense to "flip the 2P bit" because there is no bit in our
248 /// data structure that represents 2P because 2P is divisible by 2. If we add P to that again, we
249 /// get 3P, which also doesn't have a bit to it because it's divisible by 3.
250 ///
251 /// The only numbers that have a bit matching their value is numbers of the form XP, where X is
252 /// not divisible by 2, 3, or 5 (saying this over and over is getting repetitive. From now on,
253 /// we'll call those "coprime with 30", or simply "30-coprime").
254 ///
255 /// If we only iterate over 30-coprime numbers, multiply them by P, and flipping the bit matching
256 /// the result to 0, we're making things 3.75x faster (30/8).
257 pub mod _4_time_efficiency {}
258}
259
260/// # Data Structure
261///
262/// As discussed in the introduction, we need to develop a special data structure for efficiently storing
263/// and generating prime numbers. We need to first, create an abstraction of those "bytes" that contain the
264/// 8 30-coprime k-values that we mentioned. Then, our data structure will be a list (vector) of those
265/// bytes.
266pub mod data_structure {
267 /// # Prime Bytes
268 ///
269 /// A bit is a 1 or a 0. A byte is a list of 8 bits. Example: `[01101001]`
270 ///
271 /// A "prime byte" is what we'll name our abstraction over k-values. Reiterating, each bit in the
272 /// prime byte will correspond to the k-value:
273 ///
274 /// `[ 01 07 11 13 17 19 23 29 ]`
275 ///
276 /// So, if we want to retrieve "the bit that corresponds to the k-value 13", that's the same as asking
277 /// "please give me the 4th bit". We can do this by performing a right shift, then taking the result
278 /// modulo 2. For instance, if we want to retrieve the third bit, we remove the last 5 bits
279 /// by shifting the byte to the right by 5. Then, we take the result modulo 2 to ignore the first 2
280 /// bits.
281 ///
282 /// ```
283 /// // the third bit >|< is 1
284 /// let byte: u8 = 0b01101010;
285 /// assert_eq!((byte >> 5) % 2, 1);
286 /// ```
287 ///
288 /// What if we want to extract the k values of a prime byte that are set to 1? In the previous
289 /// example, we had `01101010`. If we map each bit to a boolean array, where 0 maps to false and 1
290 /// maps to true, we can then zip it to the array of k values, yielding the following tuples:
291 ///
292 /// `[ (false, 1), (true, 7), (true, 11), ... (false, 19), (true, 23), (false, 29) ]`
293 ///
294 /// We can then filter all tuples with false, and extract the k value out of the remaining tuples.
295 /// The resulting list, contains all k-values that are prime, according to the byte. Of course, we can
296 /// only get their true value by determining the byte's offset.
297 ///
298 /// There are [other methods](crate::PrimeByte) implemented in this library, which are all tweaked
299 /// versions of what I showed above. For example, maybe we want to restrict to prime bytes where the
300 /// k-values belong to some range. That's easily done by applying a second condition to the filter which
301 /// is that the given range contains the k-value.
302 ///
303 /// ## Visibility
304 ///
305 /// I decided to keep Prime Bytes public, in case anyone else feels like developing a data structure
306 /// using prime bytes. But if not, they can just simply ignore it.
307 pub mod _1_prime_byte {}
308
309 /// # Prime Data
310 ///
311 /// Onto the actual data structure for storing prime numbers.
312 ///
313 /// The structure contains two fields. `data` and `range`.
314 ///
315 /// ## `data: Vec<PrimeByte>`
316 ///
317 /// The data is straightforward. It's simply a vector of [prime bytes](crate::data::PrimeByte). However,
318 /// let's say we want to store the primes up to 1 million. We can't exactly divide that into our data
319 /// because `1_000_000 % 30 = 10`. So we'll end up using 33_334 bytes, but that last byte will not
320 /// contain the full information about primes in the range (999_990, 1_000_020).
321 ///
322 /// Therefore, this data would produce undefined behaviour if we tried to retrieve the primes in that
323 /// last byte of data. In order for us to rigorously make sure to return an error when it happens,
324 /// we need to store a range.
325 ///
326 /// ## `range: RangeInclusive<u64>`
327 ///
328 /// An inclusive range will determine what are the exact bounds of the given data. For example, if
329 /// we had only one byte of data, and the range was (33..=58), we can infer that the byte of data
330 /// represents prime candidates from (30..60), however, the first bit is not part of the data, as it
331 /// represents 31. Same goes for the last bit of data, which represents 59. Trying to access those
332 /// bits should result in an error.
333 ///
334 /// The range start is especially important because it also determines the byte offset we discussed
335 /// in the introduction. The offset is basically the range start divided by 30. If the range starts
336 /// at 0, 20, or anything below 30, the first byte corresponds to (0..30). As soon as the range starts
337 /// at 30, this means the first byte corresponds to (30..60) instead.
338 ///
339 /// The range will also be an important factor for expanding data. But in order to expand data, we need
340 /// to understand how to [iterate over them](crate::guide::iterators).
341 pub mod _2_prime_data {}
342}
343
344/// # Iterators
345pub mod iterators {
346 /// # Iterating over 30-Coprime Numbers
347 ///
348 /// As discussed in the introduction, if we want to create a prime sieve using our data structure, we
349 /// need to iterate over numbers coprime with 30, to multiply them by some prime, and retrieve the
350 /// bit that corresponds to that result to flip it to zero.
351 ///
352 /// This is quite straightforward to do. We start with some offset and some value between 0 and 7,
353 /// which is a "bit position" that corresponds to a k-value. Calling `next()` will increase that
354 /// bit by one (yielding the next k-value), or if that goes above 7, we reset it to 0 and increase
355 /// the offset by 1. We stop once `30*offset + k` is greater than some bound.
356 ///
357 /// If you need more intricate details, you can see my implementation [here](crate::CoprimeIter).
358 pub mod _1_coprime {}
359
360 /// # Iterating over Prime Numbers
361 ///
362 /// The prime sieve requires us to iterate over primes, so that we then create a nested iterator
363 /// that will iterate over 30-coprime numbers. Those primes will come from an already known data.
364 ///
365 /// This is much trickier than simply iterating over 30-Coprimes so I'll include the details of
366 /// my implementation below.
367 ///
368 /// The struct's fields are as follows:
369 ///
370 /// ```no_run
371 /// # use prime_data::PrimeByte;
372 /// struct PrimeIter<'a> {
373 /// data: &'a [PrimeByte],
374 /// primes: Option<Vec<u64>>,
375 /// current: (u64, usize),
376 /// data_offset: u64,
377 /// stop_at: u64,
378 /// }
379 /// ```
380 ///
381 /// Let's say we have some prime data. It has 7 [bytes](crate::PrimeByte) of data, with the range
382 /// `(73..=262)`. This tells us the data offset is 2, because `73 / 30 = 2`. Technically, the data
383 /// bytes range over (60..270), but the actual range tells us we cannot verify if 61, 67, 71, 263,
384 /// or 269 are primes or not. Even though their bits are in the data, they're not valid bits because
385 /// they fall out of the verified range.
386 ///
387 /// If we try to create an [iterator](crate::data::PrimeIter) of primes in the range `(72..=260)`,
388 /// it will send out an error, since that range is not contained in our original data's range. Even
389 /// if it's trivially known that 72 is not prime, and is the only number out of the range. However,
390 /// I could change that in the [future](super::future).
391 ///
392 /// Let's say we wanted to iterate over primes in the range `(101..=240)` instead. That range is
393 /// contained in the original range, so it'll not return an error. From that range, we can already
394 /// determine `stop_at = 240`.
395 ///
396 /// We don't have to take a slice reference to the entire data. The first byte is the range (60..90),
397 /// so we can start the iterator on the byte after, because 101 is contained in the next byte.
398 /// Same applies for the ending. The 7th byte contains the range (240..270). However, since 240 is
399 /// directly in between two bytes (we can argue the 6th byte contains the range (210..=240)), we can
400 /// stop at the 6th byte. So we can determine `data = &prime_data[1..=6]`. If the range end was 241
401 /// instead, then we would need the slice end to be at 7, not 6.
402 ///
403 /// Bytes by themselves, as we saw, cannot give you prime numbers. They need an offset. Same goes
404 /// for any slice of prime bytes. We need the data's offset. Since the first byte (`data[0]`) is
405 /// the range (90..120), we can determine the `data_offset = 90 / 30 = 3`
406 ///
407 /// Now, we can retreive primes from the data by applying the offset. However, notice that we can't
408 /// simply [get all primes](crate::data::PrimeByte::as_primes) because the range starts at 101, not
409 /// 90. So we need to [restrict ourselves](crate::PrimeByte::as_primes_in_range) to the range
410 /// (101..=120). The result of that will be the field`primes`. The reason why it needs to be an
411 /// option is that there are two situations in which we won't be able to retrieve that vector.
412 ///
413 /// 1. If the range has no primes, the given vector will be empty. This means if we try to access
414 /// `self.primes[self.current.1]`, that will panic. Therefore, as a safety measure, we store it
415 /// as an option and set it to None when empty.
416 ///
417 /// 2. If we're iterating through the data, we could encounter a prime byte that has no primes.
418 /// This means, when we iterate through the data, we need to keep incrememting `current.0` until
419 /// we hit a prime byte that is non empty. However, there's a chance we increase it to the point where
420 /// we hit the end of our slice. We return it as None and move on.
421 ///
422 /// In both situations, that's equivalent of "ending the iterator". From that, we will always
423 /// yield None.
424 ///
425 /// # Retrieving `next()`
426 ///
427 /// Let's say we're at some step of the iterator. When calling the next function, the first thing we
428 /// do is check if `self.primes` is [None](Option::None) and return that if so. Otherwise, we retrieve
429 /// a prime number using `primes[self.current.1]`. This will never panic because we won't allow the
430 /// iterator to reach the given vector's length.
431 ///
432 /// If not, we gotta look for the next prime. First, we check if we hit the ending of the current
433 /// vector. If we did, we'll increment `self.current.0` to reach out for the next piece of data.
434 /// If that is equal to the data's length, it means we hit the actual end of the whole data. That's
435 /// where we set `self.primes = None` and return the last prime number we got previously.
436 ///
437 /// Of course, if we didn't hit the vector's length, we just increment the index and move on. If we
438 /// did, but we didn't hit the data's length, we'll just grab the next byte and retrieve its
439 /// prime numbers using [as_primes](crate::PrimeByte::as_primes). The offset that should be given
440 /// as a parameter is `self.data_offset + self.current.0`.
441 pub mod _2_prime {}
442}
443
444/// # Sieving
445///
446/// Sieving requires some already know data. Hence, PrimeData has a handy method which is a hard-coded
447/// piece of data with all primes from 0 to 30. Can you guess how the prime byte looks like? If you need,
448/// refer to the [k-values](crate::K_VALUES)
449///
450/// Answer: `[01111111]` (1 isn't prime)
451pub mod sieving {
452 /// # Sieving = Expansion
453 ///
454 /// Data Expansion is essentially, grabbing an initial piece of data, generating a new piece with a wider range,
455 /// then sieving it to filter out nonprime numbers, just like an Eratosthenes sieve. However, there's a very
456 /// intricate reason I chose to call this an "expansion": Because it sounds cool 😎.
457 ///
458 /// Anyway, if we have some piece of data and we want to expand it to the prime numbers from 30 thousand to 50
459 /// thousand, we first need to check if the original piece of data can expand that much. `isqrt(50_000) = 223`,
460 /// so as long as the original data goes from 1 to 223 or more, we're set.
461 ///
462 /// If so, the first thing we do is create a starter data. The range is, naturally, `30_000..=50_000`, the data
463 /// size will be `ceil(50_000 / 30) - floor(30_000 / 30) = 667` and the entire data is filled with ones.
464 ///
465 /// The next step is so iterate over prime numbers from 1 to 223 (I mean, we can start anywhere from 0 to 7,
466 /// same results). We know how to do that. Each iteration yields a prime number **p**. For each **p**, we need
467 /// to calculate the biggest number, that when multiplied by it, will be less than or equal to 50 thousand.
468 /// If we consider the first prime in the iteration, 7, that is `floor(50_000 / 7) = 7142`.
469 ///
470 /// From that, we can now iterate over numbers coprime with 30 (not just primes) that are between 1 and 7142
471 /// (again, it doesn't have to be 1, anything from 0 to 7 works). Each iteration will yield a **c**.
472 ///
473 /// Then, all we have to do is calculate **pc** and the result of that multiplication is not a prime number!
474 /// We just need to use some handy `data.set_nonprime(p*c)` function to flip the bit to zero. For a
475 /// safer user experience, that function is private for PrimeData, but it [is](crate::PrimeByte::set_nonprime)
476 /// public for prime bytes, if you wish to create your own data struct using those.
477 ///
478 /// That's it. That's all there is to it. Once we iterate over all **p** values, the expanded data is a valid
479 /// data of prime numbers from 30 thousand to 50 thousand!
480 ///
481 /// Of course, as previously said, we cannot *join* those two bits of data together unless the original data
482 /// ranges to 30 thousand. But since the iteration process does not consume data, just references it, we can
483 /// expand to as many new PrimeData as we want, and join them all together once their starts and ends match.
484 /// Joining two PrimeData into one has not yet been implemented, but [is planned to](super::future).
485 pub mod _1_expansion {}
486}
487
488/// # Estimates
489///
490/// As discussed in the introduction, our only hope to safely generate and count prime numbers is by
491/// sieving-like methods. However, sometimes, a good approximation is all we need.
492pub mod estimates {
493 /// # Approximating π(x)
494 ///
495 /// Don't be fooled! That "π" represents a function, not the famous constant.
496 ///
497 /// Mathematicians refer to π(N) as "how many primes, less than or equal to N, are there?".
498 ///
499 /// We can generate some prime data from 1 to N and count its primes. But that takes a good amount
500 /// of time if N is big enough. If we want to approximate it (we'll see later why this is useful),
501 /// we can use a very interesting theorem.
502 ///
503 /// > π(x) ~ x / ln(x)
504 ///
505 /// This is known as the Prime Number Theorem, and unfortunately, even the most elementary proof for
506 /// this is quite the hassle for this guide. But, if you're interested, you can give it a try. Heads
507 /// up, you'll find yourself reading about complex analysis (including the Riemann Zeta Function),
508 /// some crazy logarithmic sums, perhaps even integrals.
509 ///
510 /// I'm also not gonna deeply explain what ln(x) is. It's the inverse of an exponential function.
511 /// Exponentials have the properties of growing incredibly quickly, and defined for all x values.
512 /// This means that its inverse, the logarithm, will grow incredibly **slowly**, but it never stops
513 /// growing. If it did, that would mean the logarithm stops at a certain bound. And that would mean
514 /// the exponential wasn't defined for anything above that bound.
515 ///
516 /// That's kinda like prime numbers. The further we go in the number line, the less primes we'll find,
517 /// as they get sparser, but we'll never stop finding primes. So that theorem does make sense.
518 ///
519 /// # Approximations
520 ///
521 /// The problem here, is that ⌊ x / ln(x) ⌋ is not a great approximation for π(x). When x = 1000, the
522 /// approximation is off by 13%. If x = 1 billion, it's off by 5%.
523 ///
524 /// Don't get me wrong, that's okay, but 5% of 1 billion still is still 50 million. I think we need
525 /// a better way to approximate things.
526 ///
527 /// There's a very cool alternative, which is (x / ln(x)) × Σ ( n! / ln^n(x) )
528 ///
529 /// That is, we multiply the original approximation by a sum. That sum, theoretically goes to infinity,
530 /// but since we want an approximation, we can stop at a bound B. So, for each n in the range (0..=B),
531 /// we evaluate ( n! / ln^n(x) ), then we sum everything.
532 ///
533 /// Just to make sure everyone understands, n! refers to the factorial of n. And ln^n(x) refers to
534 /// `x.ln().powi(n)` in Rust code.
535 ///
536 /// However, if we want to stop at a certain bound B, we should probably tweak those n values a bit.
537 /// Not the ones in ln^n(x), but the ones in n factorial.
538 ///
539 /// Pierre Dusart, a french mathematician, found out that if we only take the range (0..2), it's
540 /// a good approximation to use the n values: (0, 1, 2.51).
541 ///
542 /// This yields the function (x / ln(x)) × (1 + 1 / ln(x) + 2.51 / ln^2(x)). Spoiler alert, at
543 /// x = 1 billion, the approximation is off by 0.035%. Nice.
544 ///
545 /// That function above is actually an upper bound for π(x) after ~356k. We can tweak the n values
546 /// to get a nice lower bound as well. I plan to implement that in the [future](super::future).
547 pub mod _1_bounds {}
548
549 /// # Approximating p(n)
550 ///
551 /// p(n) refers to the "nth prime number". For example, as we discussed, the first prime numbers are
552 /// {2, 3, 5, 7, ...}. So p(1) = 2, p(2) = 3, p(4) = 7, and so on. p(0) is undefined. I know, I know,
553 /// within programming we commonly start at zero, but this is mathematician land. And that's how
554 /// mathematicians chose to do this.
555 ///
556 /// Anyway, if we approximate π(1 billion) ~ 50865514, this means there are roughly 50 million primes
557 /// under 1 billion. If we listed them out in a vector, the last entry would be very close to 1 billion
558 /// and its index would close to 50865514. See what we're doing here?
559 ///
560 /// We can flip the question around. We can instead say that the 50865514th prime is ~ 1 billion.
561 /// It's actually quite accurate, the real answer is 1,000,373,273.
562 ///
563 /// However, I found myself fond of a [formula](https://en.wikipedia.org/wiki/Prime_number_theorem#Approximations_for_the_nth_prime_number)
564 /// I found on Wikipedia.
565 ///
566 /// What's fun is that, with an approximation and a known error bound for that approximation, we can
567 /// create a lower and upper bound for where the nth prime number lies. Then, we can simply generate
568 /// data, count the primes from 1 to the lower bound, then look for nth prime between those bounds.
569 pub mod _2_nth_prime {}
570}
571
572/// # Plans for the Future
573///
574/// This section isn't really a "guide", and more of a list of things I plan to add in the future, to my
575/// library.
576///
577/// * High Priority = Expect to see it in the next versions
578/// * Medium Priority = I will do it eventually.
579/// * Low Priority = I could do it, but I don't see why would I.
580///
581/// # High Priority
582///
583/// ## Faster `is_prime(x)`
584///
585/// Currently, this function creates prime data up to the square root of `x`, then checking if none of the
586/// generated primes divide x. Unfortunately, this is way slower than it needs to be. Primality tests
587/// like Fermat or, the one I intend to implement, the Miller-Rabin test, are way faster.
588///
589/// ## More Rigorous Tests
590///
591/// Currently I'm running integration tests to make sure the data generation is working properly.
592/// But most of the other methods PrimeData and PrimeByte have have not been thoroughly tested.
593///
594/// # Medium Priority
595///
596/// ## Add a `lower_bound()` method
597///
598/// Currently, the [estimate](crate::estimate) module only has an [`upper_bound`](crate::estimate::upper_bound)
599/// method.
600///
601/// ## Better approximatin function for π(x)
602///
603/// Currently, π(x) approximations are never off by more than 0.5%. They're actually super good for values
604/// greater than [`u32::MAX`], but below that, I'm probably better off using combinatorial methods.
605///
606/// # Low Priority
607///
608/// ## Smarter Structs
609///
610/// If your data ranges from 73 to 144, you can't retrieve primes from 72 to 144, even if it's obvious
611/// that 72 isn't prime. I could implement a check to see if the values that lie outside of the data range
612/// are "obviously prime" (by checking if they're not coprime with 30).
613pub mod future {}