# ๐ pgm_index โ Learned Index for Sorted Keys

> PGM-Index is a space-efficient data structure for fast lookup in sorted sequences.
> It approximates the distribution of keys with piecewise linear models, allowing searches in **O(log ฮต)** with a guaranteed error bound.
---
## ๐ Algorithm
Based on the work by Paolo Ferragina & Giorgio Vinciguerra:
> *The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds* (2020)
๐ [Paper](https://doi.org/10.1145/3373718.3394764) ยท ๐ [Official site](https://pgm.di.unipi.it/)
---
## ๐ฆ Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
pgm_index = "*"
````
---
## ๐ Usage
```rust
use pgm_index::PGMIndex;
fn main() {
let data: Vec<u64> = (0..1_000_000).collect();
let pgm = PGMIndex::new(&data, 32); // ฮต = 32
let key = 123456;
if let Some(pos) = pgm.search(key) {
println!("Found at position {}", pos);
} else {
println!("Not found");
}
}
```
---
## ๐ Benchmarks by ฮต
Dataset: **1,000,000 elements**, 100,000 random queries
CPU: Intel Core i7 12700, Windows 11, single-threaded
| 16 | 2.19 ms | 7.99 MB | 3906 | 20.84 M/s | 25.35 M/s | 48.0 / 39.4 |
| 32 | 2.22 ms | 7.87 MB | 1953 | 24.09 M/s | 24.98 M/s | 41.5 / 40.0 |
| 64 | 2.05 ms | 7.75 MB | 976 | 21.96 M/s | 26.06 M/s | 45.5 / 38.4 |
| 128 | 2.07 ms | 7.69 MB | 488 | 19.64 M/s | 25.56 M/s | 50.9 / 39.1 |
**Binary Search (baseline)**:
4.32 ms, 23.13 M/s, 43.2 ns/query.
---
## ๐ Comparison to Other Indexes (1M elements) *
Using **ฮต = 32** as a balanced configuration:
| **PGM-Index (ฮต=32)** | **7.87 MB** | **2.22 ms** | **24.09 M/s** | **24.98 M/s** |
| **Binary Search** | \~8.0 MB | โ (no build) | 23.13 M/s *(0.96ร)* | 23.13 M/s *(0.93ร)* |
| **BTreeMap** | \~24 MB | \~50 ms | \~4.0 M/s *(0.17ร)* | \~4.0 M/s *(0.16ร)* |
| **HashMap** | \~64 MB | \~15 ms | \~40.0 M/s *(1.66ร)* | \~40.0 M/s *(1.60ร)* |
* our benchmarks
---
### ๐ Relative Performance *
| Memory | **1.02ร better** | **3.05ร better** | **8.13ร better** |
| Build Time | โ | **22.5ร faster** | **6.8ร faster** |
| Single Lookup | **1.04ร faster** | **6.0ร faster** | 0.6ร slower |
| Batch Lookup | **1.08ร faster** | **6.2ร faster** | 0.62ร slower |
* our benchmarks
---
## ๐ Potential Use Cases
* Indexing large sorted numeric datasets
* Time-series databases
* Read-optimized storage engines
* Scientific & bioinformatics data search
* Columnar store secondary indexes
---
## ๐ License
MIT