Crate ironstorm_lookup [−] [src]
Overview
This library contains the internal data structure used by the ironstrom project
Design goals
- Lightning fast auto completion / type ahead lookups (~200 microseconds! per lookup)
- Not too much searchable text per entry, e.g: street names for locations or movie titles for movies
- High number of possible candidates (multiple gigabytes)
- It can be recommended, but must not be rquired to fit the whole data set into physical memory
- The LookupTable should use virtual memory and OS level optimization to handle larger data sets
- Full text search capability
- Optimized for hardly ever changing data sets, e.g.: All streets in a country
- No mulithreading if not absolutely required => Buy lookup speed with memory, not processing power!
- Optimize for returning a small number of matches, e.g: Find first 10 of 2 million movies that contain 'hero'
- Only one dimensional coarse sorting required, e.g: Fantasy books should be returnd before science fiction books
- Lazy stream/iterator based lookup implementation
Accepted drawbacks
- Creating a
LookupTable
for multiple gigabytes of data can take a few minutes - A
LookupTable
can not be modified, only recreated - No fine granular sorting possible: e.g: by lexicographical order
Basic Usage
- Create a custom type for the data you want to seacrh for, e.g.: a
Movie
struct - Implement the
Lookup
trait for your custom type. - Create an
Iterator
that will iterate over all the elements you would like to put into theLookupTable
- Create a new
LookupTable
by callingLookupTable::from_iter(myMoviesIterator)
- Call
myMoviesLookupTable.find("hero")
to get an lazy 'Iterator' over all matching elements
Example
Let's build a LookupTable
to find restaurants by name.
use std::iter::FromIterator; use ironstorm_lookup::{LookupTable, Lookup, Bucket}; // 1. Create a custom struct representing a restaurant struct Restaurant<'a> { name: &'a str, cuisine: &'a str } // 2. Implement the `Lookup` trait for `Restaurant` references impl <'a> Lookup for &'a Restaurant<'a> { // Make the restaurant name searchable fn searchable_text(&self) -> String { self.name.to_string() } // Decide, based on cuisine, to which `Bucket` a restaurant belongs. // `Bucket` is just a type alias for an unsigned integer aka usize. // Matches in lower buckets will be returned before matches in higher buckets. fn bucket(&self) -> Bucket { match self.cuisine { "italian" => 0, "german" => 0, "chinese" => 1, _ => 5 } } } // 3. Create some restaurants and the according iterator let restaurants = vec![ Restaurant{name:"India Man", cuisine:"indian"}, Restaurant{name:"Ami Guy", cuisine:"american"}, Restaurant{name:"Italiano Pizza", cuisine:"italian"}, Restaurant{name:"Sushi House", cuisine:"chinese"}, Restaurant{name:"Brezel Hut", cuisine:"german"} ]; let iter = restaurants.iter(); // 4. Create the `LookupTable` let lookup_table = ironstorm_lookup::LookupTable::from_iter(iter); // 5. Find restaurants containing `i` let mut result_iter = lookup_table.find("i"); // two times 'Italiano pizza', because it's in the lowest bucket // two times because it has two lower case `i` in the name assert_eq!(result_iter.next().unwrap().name, "Italiano Pizza"); assert_eq!(result_iter.next().unwrap().name, "Italiano Pizza"); // 'Sushi House', because it's in the second lowest bucket assert_eq!(result_iter.next().unwrap().name, "Sushi House"); // 'Ami Guy' or ' India Man' // They are in the same bucket and there is no order within the same bucket let indian_or_american_1 = result_iter.next().unwrap().name; assert!(indian_or_american_1=="India Man" || indian_or_american_1=="Ami Guy"); // The other one of 'Ami Guy' or ' India Man' let indian_or_american_2 = result_iter.next().unwrap().name; assert!(indian_or_american_2=="India Man" || indian_or_american_2=="Ami Guy"); assert!(indian_or_american_1 != indian_or_american_2); // No more matches // "Brezel Hut" doesn't contain an "i" and was not part of the result. assert!(result_iter.next().is_none());
Structs
LookupTable |
This is the actual |
Traits
Lookup |
Implement this trait for types that are going be put into a |
Type Definitions
Bucket |
Every value that is inserted into the lookup table must be assigned to a bucket. Values, assigned to a lower bucket, will be returned before values from a higher bucket. This bucket mechanism is used instead a full blown sorting algorithm to boost performance. |