Module _2_prime

Module _2_prime 

Source
Expand description

§Iterating over Prime Numbers

The prime sieve requires us to iterate over primes, so that we then create a nested iterator that will iterate over 30-coprime numbers. Those primes will come from an already known data.

This is much trickier than simply iterating over 30-Coprimes so I’ll include the details of my implementation below.

The struct’s fields are as follows:

struct PrimeIter<'a> {
    data: &'a [PrimeByte],
    primes: Option<Vec<u64>>,
    current: (u64, usize),
    data_offset: u64,
    stop_at: u64,
}    

Let’s say we have some prime data. It has 7 bytes of data, with the range (73..=262). This tells us the data offset is 2, because 73 / 30 = 2. Technically, the data bytes range over (60..270), but the actual range tells us we cannot verify if 61, 67, 71, 263, or 269 are primes or not. Even though their bits are in the data, they’re not valid bits because they fall out of the verified range.

If we try to create an iterator of primes in the range (72..=260), it will send out an error, since that range is not contained in our original data’s range. Even if it’s trivially known that 72 is not prime, and is the only number out of the range. However, I could change that in the future.

Let’s say we wanted to iterate over primes in the range (101..=240) instead. That range is contained in the original range, so it’ll not return an error. From that range, we can already determine stop_at = 240.

We don’t have to take a slice reference to the entire data. The first byte is the range (60..90), so we can start the iterator on the byte after, because 101 is contained in the next byte. Same applies for the ending. The 7th byte contains the range (240..270). However, since 240 is directly in between two bytes (we can argue the 6th byte contains the range (210..=240)), we can stop at the 6th byte. So we can determine data = &prime_data[1..=6]. If the range end was 241 instead, then we would need the slice end to be at 7, not 6.

Bytes by themselves, as we saw, cannot give you prime numbers. They need an offset. Same goes for any slice of prime bytes. We need the data’s offset. Since the first byte (data[0]) is the range (90..120), we can determine the data_offset = 90 / 30 = 3

Now, we can retreive primes from the data by applying the offset. However, notice that we can’t simply get all primes because the range starts at 101, not 90. So we need to restrict ourselves to the range (101..=120). The result of that will be the fieldprimes. The reason why it needs to be an option is that there are two situations in which we won’t be able to retrieve that vector.

  1. If the range has no primes, the given vector will be empty. This means if we try to access self.primes[self.current.1], that will panic. Therefore, as a safety measure, we store it as an option and set it to None when empty.

  2. If we’re iterating through the data, we could encounter a prime byte that has no primes. This means, when we iterate through the data, we need to keep incrememting current.0 until we hit a prime byte that is non empty. However, there’s a chance we increase it to the point where we hit the end of our slice. We return it as None and move on.

In both situations, that’s equivalent of “ending the iterator”. From that, we will always yield None.

§Retrieving next()

Let’s say we’re at some step of the iterator. When calling the next function, the first thing we do is check if self.primes is None and return that if so. Otherwise, we retrieve a prime number using primes[self.current.1]. This will never panic because we won’t allow the iterator to reach the given vector’s length.

If not, we gotta look for the next prime. First, we check if we hit the ending of the current vector. If we did, we’ll increment self.current.0 to reach out for the next piece of data. If that is equal to the data’s length, it means we hit the actual end of the whole data. That’s where we set self.primes = None and return the last prime number we got previously.

Of course, if we didn’t hit the vector’s length, we just increment the index and move on. If we did, but we didn’t hit the data’s length, we’ll just grab the next byte and retrieve its prime numbers using as_primes. The offset that should be given as a parameter is self.data_offset + self.current.0.