Expand description
QJL sketch — fast approximate attention scoring via sign-based vector compression.
Compresses key/value vectors using random projection sign hashing (QJL) and
min-max scalar quantization, then stores them in append-only mmap-backed stores.
Scoring is approximate inner product via packed sign bits; batched store-level
scoring can be GPU-accelerated with the gpu feature.
§Feature flags
serde— enablesSerialize/Deserializeon all public structs and streaming store export/import.gpu— enables WGPU GPU-acceleratedKeyStore::scores(batched float × sign).