weighted_trie
🦀 Rust crate that allows creating weighted prefix trees that can be used in autocomplete
Features
- Speed-optimized: Single insert in 272ns, lookup in 244ns, bulk build (100K) in 33.7ms
- Memory-efficient: Only 243 bytes per word at 1M scale
- In-memory: Pure memory-based data structure, no disk I/O
- Dynamic: Supports incremental inserts on-the-fly
- Prefix-based: Returns all matches for a given prefix, sorted by weight
- Weight-sorted: Results pre-sorted by descending weight
- Configurable limits:
- Max suggestions per query (default: 10)
- Max word length (default: 100 characters)
Quickstart
To use weigthed-trie, add the following to your Cargo.toml file:
[]
= "0.1.0" # NOTE: Replace to latest minor version.
Usage overview
use WeightedTrie;
let mut trie = new;
// build trie with words and associated weights
trie.insert;
trie.insert;
trie.insert;
trie.insert;
// get prefix based suggestions sorted by weight
let suggestions = trie.search;
assert_eq!;
let suggestions = trie.search;
assert_eq!;
// out of vocabulary
let suggestions = trie.search;
assert_eq!;
Alternatively you can use .build method
use ;
let weighted_strings = vec!;
let trie = build;
Custom Configuration
use WeightedTrie;
// Create trie with custom max word length of 50 characters
let mut trie = with_max_word_length;
// This succeeds
assert!;
// This fails - word too long
let very_long_word = "a".repeat;
assert!;
// Create trie with custom max suggestions limit
let mut trie = with_max_suggestions;
for i in 0..20
assert_eq!; // Only top 5 returned
// Configure both word length and suggestions limit
let mut trie = with_config;
Benchmarks
Performance
Single insert: 272 ns
Lookup (per query): 244 ns
Build (100K words): 33.7 ms
Insert 100K (incremental): 32.6 ms
Memory Footprint
Dataset Memory Bytes/Word
------------------------------------
10K 2.4 MB 254
50K 13.0 MB 273
100K 29.5 MB 309
500K 130.8 MB 274
1M 231.4 MB 243
Run detailed memory analysis:
Guidelines
README.md is generated from cargo readme command.
Do not manually update README.md instead edit src/lib.rs
and then run cargo readme > README.md.
Optimizations
- String interning: Each word stored once, nodes use indices
- Packed suggestions: Weight+index packed into u64 (bit manipulation)
- compact_str: 12-byte strings vs 24-byte std String
- SmallVec: Stack allocation for small collections (≤2 suggestions, ≤4 children)
- Arena allocation: All nodes in Vec, use indices instead of Box pointers
- hashbrown HashMap: Faster than std HashMap
- Pre-allocation: Vec capacity = words × 2 (avoids reallocations)
- shrink_to_fit: Removes over-allocation after build
- Top-K limiting: 10 suggestions per node max
- Deduplication: Automatic on insert
License
License: Apache-2.0