๐ pgm_index โ Learned Index for Sorted Keys
PGM-Index is a space-efficient data structure for fast lookup in sorted sequences.
It approximates the distribution of keys with piecewise linear models, allowing searches in O(log ฮต) with a guaranteed error bound.
๐ Algorithm
Based on the work by Paolo Ferragina & Giorgio Vinciguerra:
The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds (2020)
๐ Paper ยท ๐ Official site
๐ฆ Installation
Add to your Cargo.toml
:
[]
= "*"
๐ Usage
use PGMIndex;
๐ Benchmarks by ฮต
Dataset: 1,000,000 elements, 100,000 random queries CPU: Intel Core i7 12700, Windows 11, single-threaded
ฮต | Build Time | Mem Usage | Segments | Single Lookup | Batch Lookup | Avg ns/query |
---|---|---|---|---|---|---|
16 | 2.19 ms | 7.99 MB | 3906 | 20.84 M/s | 25.35 M/s | 48.0 / 39.4 |
32 | 2.22 ms | 7.87 MB | 1953 | 24.09 M/s | 24.98 M/s | 41.5 / 40.0 |
64 | 2.05 ms | 7.75 MB | 976 | 21.96 M/s | 26.06 M/s | 45.5 / 38.4 |
128 | 2.07 ms | 7.69 MB | 488 | 19.64 M/s | 25.56 M/s | 50.9 / 39.1 |
Binary Search (baseline): 4.32 ms, 23.13 M/s, 43.2 ns/query.
๐ Comparison to Other Indexes (1M elements) *
Using ฮต = 32 as a balanced configuration:
Structure | Memory Usage | Build Time | Lookup Speed (single) | Batch Lookup Speed |
---|---|---|---|---|
PGM-Index (ฮต=32) | 7.87 MB | 2.22 ms | 24.09 M/s | 24.98 M/s |
Binary Search | ~8.0 MB | โ (no build) | 23.13 M/s (0.96ร) | 23.13 M/s (0.93ร) |
BTreeMap | ~24 MB | ~50 ms | ~4.0 M/s (0.17ร) | ~4.0 M/s (0.16ร) |
HashMap | ~64 MB | ~15 ms | ~40.0 M/s (1.66ร) | ~40.0 M/s (1.60ร) |
- our benchmarks
๐ Relative Performance *
Metric | vs Binary Search | vs BTreeMap | vs HashMap |
---|---|---|---|
Memory | 1.02ร better | 3.05ร better | 8.13ร better |
Build Time | โ | 22.5ร faster | 6.8ร faster |
Single Lookup | 1.04ร faster | 6.0ร faster | 0.6ร slower |
Batch Lookup | 1.08ร faster | 6.2ร faster | 0.62ร slower |
- our benchmarks
๐ Potential Use Cases
- Indexing large sorted numeric datasets
- Time-series databases
- Read-optimized storage engines
- Scientific & bioinformatics data search
- Columnar store secondary indexes
๐ License
MIT