Expand description
Extremely fast phrase search implementation.
§Overview
This implementation follows some of the ideas proposed in this blog post by Doug Turnbull. The full explanation on how the internals work can be found in here.
This crate uses the log crate for logging during indexing.
It’s highly recommended to compile this crate with -C llvm-args=-align-all-functions=6
.
§Usage
use phrase_search::{CommonTokens, Indexer, SimdIntersect};
// Creates a new indexer that can be reused, it will index 300_000 documents
// in each batch and will use the top 50 most common tokens to speed up the search,
// by merging them.
let indexer = Indexer::new(Some(300_000), Some(CommonTokens::FixedNum(50)));
let docs = vec![
("look at my beautiful cat", 0),
("this is a document", 50),
("look at my dog", 25),
("look at my beautiful hamster", 35),
];
let index_name = "./index";
let db_size = 1024 * 1024;
// Indexes the documents returned by the iterator `it`.
// The index will be created at `index_name` with the given `db_size`.
let (searcher, num_indexed_documents) = indexer.index(docs, index_name, db_size)?;
// Search by the string "78"
let result = searcher.search::<SimdIntersect>("at my beautiful")?;
// This should return `[0, 35]`
let documents = result.get_documents()?;
Structs§
- Indexer
- Responsible for indexing documents.
- Naive
Intersect - Naive intersection algorithm.
- Search
Result - Final result of a search operation.
- Searcher
- Object responsible for searching the database.
- Stats
- Time stats collected during search.
Enums§
- Common
Tokens - Specifies how the common tokens are treated during indexing.
- DbError
- Possible errors that can occur while interacting with the database.
- GetDocument
Error - Possible errors when trying to retrieve documents by their internal ID.
- Search
Error - Possible errors that can occur while searching.
Traits§
- Document
- Represents all types that can be stored in the database.
- Intersection
- Allows a type to be used as an intersection algorithm when searching.