[−][src]Crate hyperminhash
Very fast, constant memory-footprint cardinality approximation, including intersection and union operation. A straight port of Hyperminhash.
As with other cardinality estimators, Hyperminhash
has two advantages when counting very large
sets or streams of elements:
- It uses a single data structure that never grows while counting elements. The structure currently consumes 32kb of memory, allocated on the stack.
- Because there are no indirections due to allocations, the amount of work done for counting a marginal element stays constant.
Given that, a Vec
or HashSet
is usually faster when counting small sets. When counting
streams of millions of elements, Hyperminhash
is much faster and uses much less memory.
use hyperminhash::Sketch; // A `Sketch` can approximate the unique count of elements it has seen over it's lifetime. let mut sk = Sketch::default(); // After initialization, a `Sketch` will never allocate. // Elements added to Sketch need to implement `std::hash::Hash` sk.add("foobar"); sk.add(1); sk.add(2); sk.add(vec![3, 4, 5]); for i in 0..5000 { sk.add(i); } // There will be some error in the count assert!(sk.cardinality() > 4_900.0 && sk.cardinality() < 5_100.0); // Using `std::iter::FromIterator` let sketch1 = (0..10_000).collect::<Sketch>(); let sketch2 = (5_000..15_000).collect::<Sketch>(); assert!(sketch1.cardinality() > 9_800.0 && sketch1.cardinality() < 10_200.0); assert!(sketch2.cardinality() > 9_800.0 && sketch2.cardinality() < 10_200.0); // The intersection of two sketches yields the approximate number of unique // elements that are in both sets. let i = sketch1.intersection(&sketch2); assert!(i > 4_800.0 && i < 5_200.0); // Merge sketch1 with sketch2 let mut sketch1 = sketch1; sketch1.union(&sketch2); assert!(sketch1.cardinality() > 14_800.0 && sketch1.cardinality() < 15_200.0); // If the `serialize`-feature is used, Sketches can be serialized/deserialized // from any Reader/Writer. #[cfg(feature = "serialize")] { let mut buffer = Vec::new(); sketch1.save(&mut buffer).expect("Failed to write"); let sketch2 = Sketch::load(&buffer[..]).expect("Failed to read"); assert_eq!(sketch1.cardinality(), sketch2.cardinality()); }
Structs
Sketch | Records the approximate number of unique elements it has seen over it's lifetime. |