# Prime Number Implementation Strategy
## Core Design: Hybrid Approach
### 1. Pre-computed Prime Table
- Store first 100,000 primes (up to 1,299,709) as a static array
- Compile-time generation using `build.rs` or `include_bytes!`
- Instant O(1) lookup for common cases
- ~400KB memory footprint (100K × 4 bytes)
### 2. On-Demand Generation
- For primes beyond the table, use optimized sieve
- Cache results in runtime HashMap
- Miller-Rabin test for very large numbers
## Implementation Architecture
```rust
// src/sql/functions/mathematics/primes.rs
use once_cell::sync::Lazy;
use std::collections::HashSet;
// Pre-computed at compile time
const PRIMES_10K: &[u32] = &include!("../../../data/primes_10k.rs");
// Lazy-loaded larger table
static PRIMES_100K: Lazy<Vec<u32>> = Lazy::new(|| {
include_bytes!("../../../data/primes_100k.bin")
.chunks(4)
.map(|chunk| u32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]))
.collect()
});
// HashSet for O(1) primality testing
static PRIME_SET: Lazy<HashSet<u32>> = Lazy::new(|| {
PRIMES_100K.iter().cloned().collect()
});
pub struct PrimeEngine {
// Runtime cache for primes beyond pre-computed range
extended_cache: std::sync::RwLock<Vec<u64>>,
}
impl PrimeEngine {
/// Get the nth prime (1-indexed)
pub fn nth_prime(&self, n: usize) -> Result<u64, String> {
if n == 0 {
return Err("Prime index must be >= 1".to_string());
}
// Fast path: pre-computed
if n <= PRIMES_100K.len() {
return Ok(PRIMES_100K[n - 1] as u64);
}
// Slow path: generate on demand
self.generate_nth_prime(n)
}
/// Check if a number is prime
pub fn is_prime(&self, n: u64) -> bool {
if n < 2 { return false; }
if n == 2 { return true; }
if n % 2 == 0 { return false; }
// Fast path: check pre-computed set
if n <= 1_299_709 {
return PRIME_SET.contains(&(n as u32));
}
// Medium numbers: trial division with prime list
if n < 1_000_000_000 {
let sqrt_n = (n as f64).sqrt() as u32;
for &p in PRIMES_100K.iter() {
if p > sqrt_n { return true; }
if n % (p as u64) == 0 { return false; }
}
return true;
}
// Large numbers: Miller-Rabin test
miller_rabin(n)
}
/// Count primes up to n (prime counting function π(n))
pub fn prime_count(&self, n: u64) -> usize {
if n < 2 { return 0; }
// Fast path: binary search in pre-computed list
if n <= 1_299_709 {
match PRIMES_100K.binary_search(&(n as u32)) {
Ok(idx) => idx + 1,
Err(idx) => idx,
}
} else {
// Use approximation or sieve for large n
self.count_primes_sieve(n)
}
}
/// Find next prime >= n
pub fn next_prime(&self, n: u64) -> u64 {
if n <= 2 { return 2; }
let mut candidate = if n % 2 == 0 { n + 1 } else { n };
while !self.is_prime(candidate) {
candidate += 2;
}
candidate
}
/// Get prime factorization
pub fn factor(&self, mut n: u64) -> Vec<(u64, u32)> {
let mut factors = Vec::new();
// Trial division with small primes
for &p in PRIMES_10K.iter() {
if p as u64 * p as u64 > n { break; }
let mut count = 0;
while n % (p as u64) == 0 {
n /= p as u64;
count += 1;
}
if count > 0 {
factors.push((p as u64, count));
}
}
// If n > 1, it's a prime factor
if n > 1 {
factors.push((n, 1));
}
factors
}
}
```
## Build-Time Prime Generation
```rust
// build.rs
use std::env;
use std::fs::File;
use std::io::Write;
use std::path::Path;
fn sieve_of_eratosthenes(limit: usize) -> Vec<u32> {
let mut is_prime = vec![true; limit + 1];
is_prime[0] = false;
is_prime[1] = false;
for i in 2..=((limit as f64).sqrt() as usize) {
if is_prime[i] {
for j in ((i * i)..=limit).step_by(i) {
is_prime[j] = false;
}
}
}
is_prime.iter()
.enumerate()
.filter(|(_, &prime)| prime)
.map(|(num, _)| num as u32)
.collect()
}
fn main() {
let out_dir = env::var("OUT_DIR").unwrap();
// Generate first 100,000 primes
let mut primes = Vec::with_capacity(100_000);
let mut candidate = 2u32;
while primes.len() < 100_000 {
if is_prime_simple(candidate) {
primes.push(candidate);
}
candidate += if candidate == 2 { 1 } else { 2 };
}
// Write as Rust array (for small sets)
let dest_path = Path::new(&out_dir).join("primes_10k.rs");
let mut f = File::create(&dest_path).unwrap();
write!(f, "[").unwrap();
for (i, p) in primes[..10_000].iter().enumerate() {
if i > 0 { write!(f, ",").unwrap(); }
if i % 20 == 0 { write!(f, "\n ").unwrap(); }
write!(f, "{}", p).unwrap();
}
write!(f, "\n]").unwrap();
// Write as binary file (for large sets)
let dest_path = Path::new(&out_dir).join("primes_100k.bin");
let mut f = File::create(&dest_path).unwrap();
for p in primes.iter() {
f.write_all(&p.to_le_bytes()).unwrap();
}
println!("cargo:rerun-if-changed=build.rs");
}
```
## SQL Functions
```sql
-- Basic prime functions
SELECT PRIME(100) as p100; -- 100th prime = 541
SELECT IS_PRIME(17) as check; -- true
SELECT PRIME_COUNT(100) as pi_100; -- 25 primes <= 100
SELECT NEXT_PRIME(100) as next; -- 101
SELECT PREV_PRIME(100) as prev; -- 97
-- Prime factorization
SELECT FACTOR(60) as factors; -- [(2,2), (3,1), (5,1)]
SELECT FACTOR_STRING(60) as str; -- "2^2 × 3 × 5"
-- Prime gaps
SELECT PRIME_GAP(100) as gap; -- PRIME(101) - PRIME(100)
SELECT MAX_GAP(1000) as max_gap; -- Largest gap in first 1000 primes
-- Twin primes
SELECT IS_TWIN_PRIME(11) as twin; -- true (11 and 13)
SELECT TWIN_PRIME(5) as fifth_twin; -- 5th twin prime pair
-- Prime testing utilities
SELECT IS_MERSENNE(31) as mersenne; -- true (2^31-1 is prime)
SELECT IS_SOPHIE_GERMAIN(11) as sg; -- true (11 and 23 are both prime)
```
## Memory/Performance Trade-offs
### Option 1: Minimal (10K primes, 40KB)
- Covers up to 104,729
- Good for most common cases
- Very fast load time
### Option 2: Standard (100K primes, 400KB)
- Covers up to 1,299,709
- Handles 99% of use cases
- Still instant load with Lazy
### Option 3: Extended (1M primes, 4MB)
- Covers up to 15,485,863
- For heavy prime computation
- Could load on first use
### Option 4: Compressed (100K primes, 200KB)
- Use bit compression techniques
- Slightly slower access
- Better memory usage
## Optimization Techniques
### 1. Wheel Factorization
```rust
// Skip multiples of 2, 3, 5
const WHEEL: [u8; 8] = [1, 7, 11, 13, 17, 19, 23, 29];
```
### 2. Segmented Sieve for Ranges
```rust
pub fn primes_in_range(start: u64, end: u64) -> Vec<u64> {
// Use segmented sieve for memory efficiency
}
```
### 3. Prime Number Theorem Approximations
```rust
pub fn approx_nth_prime(n: usize) -> u64 {
// n * ln(n) + n * ln(ln(n)) for large n
let n_f = n as f64;
let ln_n = n_f.ln();
(n_f * (ln_n + ln_n.ln())) as u64
}
```
## Integration with SQL CLI
```rust
// src/sql/functions/mod.rs
impl FunctionRegistry {
pub fn register_prime_functions(&mut self) {
let engine = Arc::new(PrimeEngine::new());
// Register all prime functions
self.register_fn("PRIME", move |args| {
let n = args[0].as_integer()?;
Ok(DataValue::Integer(engine.nth_prime(n as usize)? as i64))
});
self.register_fn("IS_PRIME", move |args| {
let n = args[0].as_integer()?;
Ok(DataValue::Boolean(engine.is_prime(n as u64)))
});
// ... more functions
}
}
```
## Usage Examples
### Find Goldbach Pairs
```sql
-- Verify Goldbach's conjecture for even numbers
WITH evens AS (
SELECT n * 2 as even_num FROM generate_series(2, 100) as n
)
SELECT
even_num,
p1.val as prime1,
even_num - p1.val as prime2
FROM evens
CROSS JOIN LATERAL (
SELECT PRIME(i) as val
FROM generate_series(1, PRIME_COUNT(even_num/2)) as i
) p1
WHERE IS_PRIME(even_num - p1.val)
AND p1.val <= even_num / 2
LIMIT 1;
```
### Analyze Prime Gaps
```sql
SELECT
n,
PRIME(n) as p,
PRIME(n+1) - PRIME(n) as gap,
ROUND(PRIME(n+1) / PRIME(n), 6) as ratio
FROM generate_series(1, 1000) as n
WHERE PRIME(n+1) - PRIME(n) > 10
ORDER BY gap DESC;
```
### Prime Factorization
```sql
SELECT
number,
FACTOR_STRING(number) as factorization,
IS_PRIME(number) as is_prime
FROM (VALUES (60), (97), (1001), (65537)) as t(number);
```
## Performance Characteristics
| PRIME(n) | n ≤ 100K | O(1) | < 1μs |
| PRIME(n) | n > 100K | O(n log n) | ~10ms per 1K |
| IS_PRIME(n) | n ≤ 1.3M | O(1) | < 1μs |
| IS_PRIME(n) | n < 10^9 | O(√n/ln n) | < 1ms |
| IS_PRIME(n) | n > 10^9 | O(k log³n) | < 10ms |
| PRIME_COUNT(n) | n ≤ 1.3M | O(log n) | < 10μs |
| FACTOR(n) | n < 10^12 | O(√n/ln n) | < 100ms |
## Summary
This approach gives us:
1. **Instant access** to first 100K primes
2. **Fast primality testing** up to 10^9
3. **Efficient factorization** for most numbers
4. **Extensible design** for additional prime functions
5. **Minimal memory footprint** (400KB for massive capability)