zipora 3.1.2

High-performance Rust implementation providing advanced data structures and compression algorithms with memory safety guarantees. Features LRU page cache, sophisticated caching layer, fiber-based concurrency, real-time compression, secure memory pools, SIMD optimizations, and complete C FFI for migration from C++.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
# Zipora

[中文](README_cn.md)

[![Build Status](https://github.com/infinilabs/zipora/workflows/CI/badge.svg)](https://github.com/infinilabs/zipora/actions)
[![License](https://img.shields.io/badge/license-BDL--1.0-blue.svg)](LICENSE)
[![Rust Version](https://img.shields.io/badge/rust-1.88+-orange.svg)](https://www.rust-lang.org)

High-performance Rust data structures and compression algorithms with memory safety guarantees.

## Key Features

- **High Performance**: Zero-copy operations, SIMD optimizations (AVX2, AVX-512), cache-friendly layouts
- **Memory Safety**: 99.8% unsafe block documentation coverage, all production unsafe blocks annotated with `// SAFETY:` comments
- **Secure Memory Management**: Production-ready memory pools with thread safety and RAII
- **Blob Storage**: 8 specialized stores with trie-based indexing and compression
- **Succinct Data Structures**: 12 rank/select variants, Rank9 (Vigna 2008), Elias-Fano / Partitioned / DP-Optimal Partitioned Elias-Fano, HybridPostingList (auto-select encoding), AMD-safe PDEP with `has_fast_bmi2`
- **BM25 Scoring**: FieldnormEncoder (Lucene SmallFloat, 1-byte fieldnorms) + Bm25BatchScorer (AVX2 SIMD batch, prefetch)
- **Specialized Containers**: 13+ containers (VecTrbSet/Map, MinimalSso, SortedUintVec, LruMap, etc.)
- **Hash Maps**: Golden ratio optimized, string-optimized, cache-optimized implementations
- **Advanced Tries**: Double-Array (DoubleArrayTrie, XOR transitions), LOUDS, Critical-Bit (BMI2), Patricia tries with rank/select, NestTrieDawg
- **Compression**: PA-Zip, Huffman O0/O1/O2, FSE, rANS, ZSTD integration
- **C FFI Support**: Complete C API (`--features ffi`)

## Quick Start

```toml
[dependencies]
zipora = "3.1.2"

# With C FFI bindings
zipora = { version = "3.1.2", features = ["ffi"] }

# AVX-512 (nightly only)
zipora = { version = "3.1.2", features = ["avx512"] }
```

### Basic Usage

```rust
use zipora::*;

// High-performance vector
let mut vec = FastVec::new();
vec.push(42).unwrap();

// Zero-copy strings with SIMD hashing
let s = FastStr::from_string("hello world");
println!("Hash: {:x}", s.hash_fast());

// Intelligent rank/select with automatic optimization
let mut bv = BitVector::new();
for i in 0..1000 { bv.push(i % 7 == 0).unwrap(); }
let adaptive_rs = AdaptiveRankSelect::new(bv).unwrap();
let rank = adaptive_rs.rank1(500);

// Unified Trie - Strategy-based configuration
use zipora::fsa::{ZiporaTrie, ZiporaTrieConfig, Trie};

let mut trie = ZiporaTrie::new();
trie.insert(b"hello").unwrap();
assert!(trie.contains(b"hello"));

// Unified Hash Map - Strategy-based configuration
use zipora::hash_map::{ZiporaHashMap, ZiporaHashMapConfig};

let mut map = ZiporaHashMap::new();
map.insert("key", "value").unwrap();

// Blob storage with compression
let config = ZipOffsetBlobStoreConfig::performance_optimized();
let mut builder = ZipOffsetBlobStoreBuilder::with_config(config).unwrap();
builder.add_record(b"Compressed data").unwrap();
let store = builder.finish().unwrap();

// Entropy coding
let encoder = HuffmanEncoder::new(b"sample data").unwrap();
let compressed = encoder.encode(b"sample data").unwrap();

// String utilities
use zipora::string::{join_str, hex_encode, hex_decode, words, decimal_strcmp};
let joined = join_str(", ", &["hello", "world"]);
assert_eq!(joined, "hello, world");
```

## Documentation

### Core Components
- **[Containers]docs/CONTAINERS.md** - Specialized containers (FastVec, ValVec32, IntVec, LruMap, etc.)
- **[Hash Maps]docs/HASH_MAPS.md** - ZiporaHashMap, GoldHashMap with strategy-based configuration
- **[Blob Storage]docs/BLOB_STORAGE.md** - 8 blob store variants with trie indexing and compression
- **[Memory Management]docs/MEMORY_MANAGEMENT.md** - SecureMemoryPool, MmapVec, five-level pools

### Algorithms & Processing
- **[Algorithms]docs/ALGORITHMS.md** - Radix sort, suffix arrays, set operations, cache-oblivious algorithms, SIMD popcount
- **[Compression]docs/COMPRESSION.md** - PA-Zip, Huffman, FSE, rANS, real-time compression
- **[String Processing]docs/STRING_PROCESSING.md** - SIMD string operations, pattern matching

### System Architecture
- **[Concurrency]docs/CONCURRENCY.md** - Pipeline processing, work-stealing, parallel trie building
- **[Error Handling]docs/ERROR_HANDLING.md** - Error classification, automatic recovery strategies
- **[Configuration]docs/CONFIGURATION.md** - Rich configuration APIs, presets, validation
- **[SIMD Framework]docs/SIMD.md** - 6-tier SIMD with AVX2/BMI2/POPCNT support

### Integration
- **[I/O & Serialization]docs/IO_SERIALIZATION.md** - Stream processing, endian handling, varint encoding
- **[C FFI]docs/FFI.md** - C API for interoperability

### Reference
- **[Porting Status]docs/PORTING_STATUS.md** - Feature implementation status

## Features

| Feature | Default | Description |
|---------|---------|-------------|
| `simd` | Yes | SIMD optimizations (AVX2, SSE4.2) |
| `mmap` | Yes | Memory-mapped file support |
| `zstd` | Yes | ZSTD compression |
| `serde` | Yes | Serialization support (serde, serde_json, bincode) |
| `lz4` | Yes | LZ4 compression |
| `async` | Yes | Async runtime (tokio) for concurrency, pipeline, real-time compression |
| `ffi` | No | C FFI bindings |
| `avx512` | No | AVX-512 (nightly only) |
| `nightly` | No | Nightly-only optimizations |

## Build & Test

```bash
# Build (default features)
cargo build --release

# Build with all features including FFI
cargo build --release --all-features

# Test
cargo test --lib

# Sanity check (all feature combinations, debug + release)
make sanity

# Benchmark (release only)
cargo bench

# Lint
cargo clippy --all-targets --all-features -- -D warnings
```

## Verified Performance

> **Test Machine**: AMD EPYC 7B13 (Zen 3), 64 vCPUs, 117 GB RAM, AVX2/BMI2/POPCNT, rustc 1.91.1, Linux 6.17.
> Results vary across hardware — Intel may differ on BMI2 (native vs microcode), ARM lacks x86 SIMD paths.
> Run `cargo bench` to reproduce on your own hardware.

### Trie / Term Dictionary (DoubleArrayTrie)

| Operation (5000 terms) | Time | Per-op |
|------------------------|------|--------|
| Lookup hit | 103 µs | 20.6 ns/lookup |
| Lookup miss | 19 µs | 3.8 ns/lookup |
| Prefix search (5 queries) | 14 µs | 2.8 µs/query |
| Insert (incremental) | 967 µs | 193 ns/term |

XOR transitions, terminal bit in NInfo, unsafe `get_unchecked` — 3 ops, 1 branch per transition.
Supports arbitrary binary keys including `\x00` bytes.

### BitVector (scatter + popcount)

| Operation (1M bits) | Zipora | Scalar Vec\<u64\> | Ratio |
|---------------------|--------|-------------------|-------|
| Scatter + popcount (20×5K docs) | **1.08 ms** | 1.35 ms | **0.80x (faster)** |
| Allocation (`with_size(1M, false)`) | **155 µs** | 247 µs | **0.63x (faster)** |
| Popcount only (50% density) | 9.25 µs | 9.26 µs | Tied |

`alloc_zeroed` (calloc), zero-copy `from_blocks`, SIMD `popcount_slice` (AVX-512 / POPCNT / AVX2 / NEON).

### popcount_slice (SIMD population count)

| Slice size | Throughput | Rate |
|------------|-----------|------|
| 16 words (128B) | 4.4 ns | 3.7 Gwords/s |
| 781 words (6KB, engine union buffer) | 150 ns | 5.2 Gwords/s |
| 10K words (80KB) | 1.9 µs | 5.4 Gwords/s |

Multi-tier dispatch: AVX-512 VPOPCNTDQ → hardware POPCNT → AVX2 vpshufb → NEON → scalar.
Used internally by `BitVector::count_ones()` and available as `zipora::algorithms::popcount_slice`.

### Succinct Data Structures

| Operation | Zipora | Baseline | Speedup |
|-----------|--------|----------|---------|
| Rank1 query (100K bits) | 192 ns || ~5.2 Gops/s |
| Select1 query (100K bits) | 5.4 ms / 100K queries || ~18.5 Mops/s |
| Bulk rank (SIMD, 50K) | 8.4 µs | 84.1 µs (individual) | **10x** |
| Bulk bitwise ops (SIMD, 50K) | 3.1 µs | 128.4 µs (individual) | **41x** |
| Range set (SIMD, 50K) | 3.2 µs | 17.9 µs (individual) | **5.6x** |

### Containers vs std

| Operation | Zipora | std | Ratio |
|-----------|--------|-----|-------|
| ValVec32 push (100K) | 119 µs | 120 µs | 1.0x |
| ValVec32 random access (100K) | 706 ns | 729 ns | **0.97x** |
| ValVec32 iteration (10K) | 778 ns | 783 ns | 1.0x |
| ValVec32 bulk extend (100K) | 21.8 µs | 28.7 µs | **0.76x** |
| SmallMap insert+lookup (8 keys) | 444 ns | 805 ns (HashMap) | **1.8x** |
| SmallMap lookup-intensive | 36.9 µs | 141.7 µs (HashMap) | **3.8x** |
| CircularQueue push+pop (100K) | 326 µs | 381 µs (VecDeque) | **0.86x** |
| FixedStr16Vec push (100K) | 755 µs | 5,906 µs (Vec\<String\>) | **7.8x** |
| SortableStrVec sort (5K) | 390 µs | 448 µs (Vec\<String\>) | **1.15x** |

### Entropy Coding (65KB input)

| Algorithm | Entropy 0.5 | Entropy 2.0 | Entropy 6.0 |
|-----------|-------------|-------------|-------------|
| Huffman O0 | 1,124 µs | 1,235 µs | 1,720 µs |
| Huffman O1 (x1 stream) | 188 µs | 173 µs | 188 µs |
| rANS64 | 405 µs | 351 µs | 426 µs |

### Cache (LRU vs HashMap)

| Operation | LruMap | HashMap | Note |
|-----------|--------|---------|------|
| Hot get (cap=64, 10K ops) | 5.7 µs | 152 µs | **26x** faster (hot-set fits in cache) |
| Hot get (cap=1024, 10K ops) | 94.6 µs | 152 µs | **1.6x** faster |
| Insert (cap=64, 10K ops) | 1,897 µs | 1,177 µs | 0.62x (eviction overhead) |

### BM25 Scoring (FieldnormEncoder + Bm25BatchScorer)

| Operation (1M docs) | UintVecMin0 + float | FieldnormEncoder | Improvement |
|---------------------|---------------------|------------------|-------------|
| Memory | 1.13 MB | **1.00 MB** | **2.0x smaller** |
| Random access | 190 µs | **92 µs** | **2.06x faster** |
| BM25 pre-compute (scalar) | 5.13 ms | **2.78 ms** | **1.85x faster** |
| BM25 pre-compute (AVX2 SIMD) | 5.13 ms | **381 µs** | **13.5x faster** |
| Phrase query scoring (1K random) | 3.63 µs | **1.22 µs** | **2.98x faster** |

Lucene SmallFloat encoding (1 byte/doc), 256-entry norm table (zero per-posting division), AVX2 batch scorer (8 scores/iteration).

## Dependencies

Minimal dependency footprint by design:
- **Core**: `bytemuck`, `thiserror`, `log`, `ahash`, `rayon`, `libc`, `once_cell`, `raw-cpuid`
- **Default**: `memmap2` (mmap), `zstd`, `lz4_flex`, `serde`/`serde_json`/`bincode`, `tokio` (async)
- **Optional**: `cbindgen` (ffi)
- **Removed**: `crossbeam-utils`, `parking_lot`, `uuid`, `num_cpus`, `async-trait`, `futures` (all replaced with std or eliminated)

## Building a Search Engine with Zipora

Zipora provides the core building blocks for high-performance search engines: succinct posting lists, compressed document storage, trie-based term dictionaries, SIMD-accelerated query processing, and multi-threaded indexing pipelines.

### Architecture Overview

```
 Documents                    Query
     |                          |
     v                          v
 [Tokenizer]              [Query Parser]
     |                          |
     v                          v
 [Term Dictionary]  --->  [Term Lookup]        DoubleArrayTrie
     |                          |
     v                          v
 [Inverted Index]  --->  [Posting Lists]       HybridPostingList
     |                          |
     v                          v
 [Doc Lengths]      --->  [BM25 Scoring]       FieldnormEncoder + Bm25BatchScorer
     |                          |
     v                          v
 [Document Store]  --->  [Doc Retrieval]       DictZipBlobStore
     |                          |
     v                          v
 [Compression]            [Encoding]           StreamVByte
```

### 1. Term Dictionary (Trie-based)

Use `DoubleArrayTrie` (double-array trie with XOR transitions) for maximum performance — 8 bytes per state with O(1) transitions per byte. Supports arbitrary binary keys including `\x00` bytes. For large vocabularies, it's 3-5x more memory-efficient than `HashMap<String, u32>` while providing faster lookups.

```rust
use zipora::DoubleArrayTrie;

// Build term dictionary during indexing
let mut dict = DoubleArrayTrie::new();

for term in terms.iter() {
    dict.insert(term.as_bytes()).unwrap();
}

// Query-time lookup: O(|key|) with O(1) per-byte transitions
assert!(dict.contains(b"search"));

// For key-value storage (term → term_id)
// DoubleArrayTrieMap<V> requires V: MapValue (configurable sentinel for zero-cost Option<V> elimination)
// Built-in impls: i32 (MIN), u32 (MAX), i64 (MIN), u64 (MAX), usize (MAX)
use zipora::DoubleArrayTrieMap;
let mut term_ids: DoubleArrayTrieMap<u32> = DoubleArrayTrieMap::new();
for (term_id, term) in terms.iter().enumerate() {
    term_ids.insert(term.as_bytes(), term_id as u32).unwrap();
}
let id = term_ids.get(b"search");
```

`DoubleArrayTrieMap<V>` uses the `MapValue` trait with a compile-time sentinel constant instead of `Option<V>`, halving the values array memory footprint for primitive types (e.g., 4 bytes vs 8 bytes per slot for `i32`). The sentinel is monomorphized to a single `cmp` instruction — zero runtime cost.

For alternative trie strategies (LOUDS, Patricia, CritBit), use `ZiporaTrie` with explicit config. For compressed term storage with prefix sharing, use `NestLoudsTrieBlobStore`.

### 2. Inverted Index (Posting Lists)

Choose the right container based on posting list characteristics:

```rust
use zipora::containers::{UintVecMin0, ZipIntVec};
use zipora::blob_store::SortedUintVec;
use zipora::BitVector;

// Option A: UintVecMin0 — variable-width packed integers (2-58 bits per value)
// Best for: medium-length posting lists with bounded doc IDs
let mut postings = UintVecMin0::new();
for doc_id in matching_docs {
    postings.push(doc_id);
}
// Access: postings.get(i) — O(1), cache-friendly sequential layout

// Option B: SortedUintVec — delta + block compression for sorted doc IDs
// Best for: long posting lists (60-80% space reduction vs raw u32)

// Option C: BitVector + RankSelect — bitmap representation
// Best for: high-frequency terms (>10% of docs), boolean queries
let mut bitmap = BitVector::new();
for i in 0..num_docs {
    bitmap.push(doc_ids.contains(&i)).unwrap();
}
```

### 3. Boolean Query Processing (Set Operations)

SIMD-accelerated set operations on posting lists — **up to 41x faster** than element-by-element processing for bitwise operations.

```rust
use zipora::algorithms::set_ops::{
    multiset_intersection,   // AND queries
    multiset_union,          // OR queries
    multiset_difference,     // NOT queries
    multiset_fast_intersection, // adaptive: picks best algo by size ratio
};

// AND query: "rust" AND "search"
let result = multiset_intersection(&postings_rust, &postings_search);

// For skewed sizes (one term rare, one common), use adaptive intersection
// Automatically picks linear merge vs binary search based on |A|/|B| ratio
let result = multiset_fast_intersection(&rare_term, &common_term);

// Bulk bitwise on rank/select bitvectors (41x faster with SIMD)
use zipora::AdaptiveRankSelect;
let rs = AdaptiveRankSelect::new(bitmap).unwrap();
let rank = rs.rank1(doc_id);   // count docs before this ID — O(1)
let pos = rs.select1(rank);    // find N-th matching doc — O(log n)
```

### 4. Document Storage (Compressed Blob Stores)

Store and retrieve documents with dictionary compression (PA-Zip):

```rust
use zipora::DictZipBlobStore;
use zipora::blob_store::{MixedLenBlobStore, PlainBlobStore, BlobStore};

// DictZipBlobStore: best compression for similar documents (web pages, logs)
// Learns a shared dictionary from training data, then compresses each record
let store = DictZipBlobStore::builder()
    .build_from_records(&documents)
    .unwrap();

// Retrieve: zero-copy access via mmap
let doc = store.get(doc_id).unwrap();

// MixedLenBlobStore: optimal for mixed fixed/variable-length records
// Automatically selects storage strategy based on record size distribution

// PlainBlobStore: uncompressed, fastest retrieval for hot data
```

### 5. Entropy Coding (Posting List Compression)

Compress posting list deltas with Huffman or rANS:

```rust
use zipora::HuffmanEncoder;
use zipora::Rans64Encoder;

// Huffman O0: simple, fast encoding (1.1 µs per 65KB)
let encoder = HuffmanEncoder::new(&training_data).unwrap();
let compressed = encoder.encode(&delta_encoded_postings).unwrap();

// Huffman O1: context-aware, better compression for structured data
// Particularly effective for posting list deltas with skewed distributions

// rANS: highest compression ratio, slightly slower
let rans = Rans64Encoder::new(&training_data).unwrap();
let compressed = rans.encode(&data).unwrap();
```

### 6. Multi-threaded Indexing

Parallelize index building with rayon and zipora's pipeline processing:

```rust
use rayon::prelude::*;
use zipora::algorithms::MultiWayMerge;

// Parallel document processing: each thread builds a segment
let segments: Vec<_> = document_batches
    .par_iter()
    .map(|batch| {
        let mut segment_index = SegmentIndex::new();
        for doc in batch {
            let terms = tokenize(doc);
            for term in terms {
                segment_index.add(term, doc.id);
            }
        }
        segment_index
    })
    .collect();

// Merge segments using k-way merge (loser tree)
use zipora::EnhancedLoserTree;
// EnhancedLoserTree provides O(log k) per element for k-way merge
// Ideal for merging sorted posting lists from parallel index segments
```

For async pipeline processing (requires `async` feature):

```rust
use zipora::Pipeline;
// Pipeline stages: parse → tokenize → index → compress → flush
// Each stage runs concurrently with work-stealing load balancing
```

### 7. Memory-Mapped Index Files

Serve large indices directly from disk without loading into RAM:

```rust
use zipora::memory::MmapVec;

// Memory-map an index file — OS manages paging
let index: MmapVec<u32> = MmapVec::open("postings.idx").unwrap();

// Random access is backed by the page cache
let doc_id = index[position];

// For blob stores, use mmap-backed storage
// DictZipBlobStore and NestLoudsTrieBlobStore support mmap natively
```

### 8. Query Result Caching

LRU cache for frequently accessed posting lists — **26x faster** hot-set retrieval vs HashMap:

```rust
use zipora::containers::specialized::LruMap;

// Cache hot posting lists
let mut cache: LruMap<String, Vec<u32>> = LruMap::new(1024);

fn get_postings(term: &str, cache: &mut LruMap<String, Vec<u32>>) -> Vec<u32> {
    if let Some(cached) = cache.get(term) {
        return cached.clone(); // 26x faster than HashMap for hot keys
    }
    let postings = load_from_disk(term);
    cache.insert(term.to_string(), postings.clone());
    postings
}
```

### 9. String Processing for Tokenization

```rust
use zipora::SortableStrVec;
use zipora::string::{decimal_strcmp, words};

// Arena-based string storage: 7.8x faster than Vec<String> for push (100K strings)
let mut terms = SortableStrVec::new();
for token in document.split_whitespace() {
    terms.push(token);
}
terms.sort(); // In-place sort, 1.15x faster than Vec<String>::sort

// For small lookup tables (field names, stop words), SmallMap is 3.8x faster
use zipora::SmallMap;
let mut stop_words = SmallMap::new();
stop_words.insert("the", true);
stop_words.insert("and", true);
```

### 10. BM25 Scoring (Document Length Normalization)

Compact doc-length storage with pre-computed BM25 scoring — **13.5x faster** than UintVecMin0 + float math, **2x smaller** memory footprint.

`FieldnormEncoder` uses Lucene-compatible SmallFloat encoding to compress document lengths into a single byte (3-bit mantissa + 5-bit exponent, same as Lucene/Tantivy). A 256-entry `[f32; 256]` norm table eliminates per-posting float division entirely. `Bm25BatchScorer` processes 8 postings per iteration with AVX2 SIMD.

```rust
use zipora::scoring::{FieldnormEncoder, Bm25BatchScorer};

// Index time: encode doc lengths to single bytes (1 byte/doc vs 2+ bytes for raw u16)
let doc_lengths = vec![50u32, 100, 150, 200, 300];
let fieldnorm_bytes: Vec<u8> = doc_lengths.iter()
    .map(|&l| FieldnormEncoder::encode(l))
    .collect();

// Build time: pre-compute BM25 norm table (256 floats, done once per segment)
let avg_dl = doc_lengths.iter().sum::<u32>() as f32 / doc_lengths.len() as f32;
let norm_table = FieldnormEncoder::build_norm_table(avg_dl, /*k1=*/1.2, /*b=*/0.75);

// Query time: batch-score a posting list (AVX2 SIMD, 8 scores per iteration)
let idf = 3.5f32;
let scorer = Bm25BatchScorer::new(&norm_table, idf, /*k1=*/1.2);
let tfs = vec![2u16, 3, 1, 5, 2];
let mut scores = vec![0.0f32; tfs.len()];
scorer.batch_score(&fieldnorm_bytes, &tfs, &mut scores);

// Phrase queries: single-doc scoring with next-doc prefetch
let score = scorer.score_with_prefetch(&fieldnorm_bytes, /*doc_id=*/42, /*tf=*/3, Some(100));

// Full 2D score table for quantized TF values (eliminates ALL query-time math)
let score_table = FieldnormEncoder::build_score_table(avg_dl, 1.2, 0.75, idf, /*max_tf=*/255);
let precomputed = score_table[3][fieldnorm_bytes[0] as usize]; // score(tf=3, doc=0)
```

**Engine benchmark results** (replaces `UintVecMin0` for doc-length storage):

| Metric | UintVecMin0 (old) | FieldnormEncoder (new) | Improvement |
|--------|-------------------|------------------------|-------------|
| Memory (1M docs) | 1.13 MB | **1.00 MB** | **2.0x smaller** |
| Random access (1M) | 190 µs | **92 µs** | **2.06x faster** |
| BM25 pre-compute (1M) | 5.13 ms | **381 µs** (SIMD) | **13.5x faster** |
| Phrase query scoring (1K random) | 3.63 µs | **1.22 µs** | **2.98x faster** |

### Component Selection Guide

| Search Engine Component | Zipora Type | When to Use |
|------------------------|-------------|-------------|
| Term dictionary | `DoubleArrayTrie` | Default choice, 8 bytes/state, XOR transitions |
| Term dictionary (alternatives) | `ZiporaTrie` | LOUDS/Patricia/CritBit via config |
| Short posting lists | `UintVecMin0` | Variable-width, <1M doc IDs |
| Long posting lists | `SortedUintVec` | Delta-compressed sorted IDs |
| Compressed posting lists | `HybridPostingList` | Auto-selects: Dense/EF/Partitioned/Optimal by list size |
| Rank/Select (large bitvecs) | `Rank9` | O(1) rank, O(log n) select, 25% overhead, hardware-independent |
| Boolean posting lists | `BitVector` + `AdaptiveRankSelect` | High-frequency terms, bitwise ops |
| AND/OR/NOT queries | `set_ops::multiset_*` | Sorted posting list intersection |
| Bulk bitwise queries | SIMD rank/select | 10-41x faster than scalar |
| Doc-length storage | `FieldnormEncoder` | 1-byte fieldnorms, replaces UintVecMin0, 2x smaller |
| BM25 scoring | `Bm25BatchScorer` | AVX2 SIMD batch (13.5x faster), prefetch for phrase queries |
| BM25 score table | `FieldnormEncoder::build_score_table` | Full 2D precomputed scores, zero query-time math |
| Document storage | `DictZipBlobStore` | Best compression for similar docs |
| Document storage (fast) | `PlainBlobStore` | Uncompressed, fastest retrieval |
| Posting compression | `HuffmanEncoder` | Fast encode/decode |
| Posting compression | `Rans64Encoder` | Best compression ratio |
| Query cache | `LruMap` | 26x faster hot-set access |
| Small lookups | `SmallMap` | 3.8x faster for ≤8 keys |
| String storage | `SortableStrVec` / `FixedStr16Vec` | Arena-based, 7.8x vs Vec\<String\> |
| Index files | `MmapVec` | Disk-backed, OS-managed paging |
| Segment merge | `MultiWayMerge` / `EnhancedLoserTree` | K-way merge of sorted lists |
| Parallel indexing | `rayon` + `Pipeline` | Multi-threaded segment building |

## License

Business Source License 1.0 - See [LICENSE](LICENSE) for details.