Expand description
§searchy: an embeddable in-memory search engine
Search index (in-memory), that can be constructed and searched using a bag of words model, with cosine similarity scoring based on tf-idf. Supports multiple search terms, permissions, text suggestions (for spelling errors), and evaluating common arithmetic expressions directly to values. Tries to reduce memory footprint.
Features:
- expression evaluation (1+2 will result in 3);
- small in memory representation using two allocations (after building the index);
- easy delta updates;
- spell suggestions based on search index;
- summary sentence in search results (based on Luhn);
- filtering of result based on group (role-based access control).
§Minimal example
use searchy::*;
use expry::*;
// MemoryPool avoids small allocations.
pool!(scope);
// Add documents
let mut builder = SearchBuilder::new();
let doc_id = builder.add_document("url".into(), "foo bar text", &mut scope);
// doc_id can be used to later remove the document from the search index
// Build search index
let index : SearchIndex = builder.into();
const MAX_DOCS : usize = 1024;
let results = index.search("query text", MAX_DOCS);
eprintln!("RESULTS: {}x in {}ms", results.docs, results.duration);
for ScoredDocumentInfo{doc_id: _, score, info} in results.entries {
eprintln!("{} -> {}", score, info);
}
// do some mutations to the search index
let mut builder = SearchBuilder::from(index);
builder.remove_document(doc_id);
Macros§
Structs§
- Builder to construct a SearchIndex. Can be constructed fresh or from an existing SearchIndex.
Functions§
- Implementation based on H.P. Luhn: The Automatic Creation of Literature Abstracts published in 1957.
- Escaping for inside the
script
HTML tag. - Encodes a part of a URL so it can be used in any part of a URL. Not to be confused if you want to escape a whole URL (e.g. the string
http://example.org
), useurl_escape_u8
for that. - Escapes a whole URL so only valid URL chars are inside the result. Not to be confused if you want to encode a random string as part of a URL (e.g. in a form), use
url_encode_u8
for that.