Front-coded string dictionary: Fast and compact indexed string set
This is a Rust library to store an indexed set of strings and support fast queires. The data structure is a plain front-coded string dictionary described in Martínez-Prieto et al., Practical compressed string dictionaries, INFOSYS 2016.
Features
- Indexed set. Fcsd implements an indexed set of strings in a compressed format.
nstrings in the set are indexed with integers from[0..n-1]and assigned in the lexicographical order. - Simple and fast compression/decompression. Fcsd maintains a set of strings in a compressed space through front coding, a differential compression technique for strings, allowing for fast decompression operations.
- Random access. Fcsd maintains strings through a bucketization technique enabling to directly decompress arbitrary strings and perform binary search for strings.
Example
use Set;
// Input string keys should be sorted and unique.
let keys = ;
// Builds an indexed set.
let set = new.unwrap;
assert_eq!;
// Gets indexes associated with given keys.
let mut locator = set.locator;
assert_eq!;
assert_eq!;
assert_eq!;
// Decodes string keys from given indexes.
let mut decoder = set.decoder;
assert_eq!;
assert_eq!;
// Enumerates indexes and keys stored in the set.
let mut iter = set.iter;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
// Enumerates indexes and keys starting with a prefix.
let mut iter = set.predictive_iter;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
// Serialization / Deserialization
let mut data = Vec::new;
set.serialize_into.unwrap;
assert_eq!;
let other = deserialize_from.unwrap;
assert_eq!;
Todo
- Add benchmarking codes.
- Add RePair compressed veriants.
Licensing
This library is free software provided under MIT.