Skip to main content

Module trigram

Module trigram 

Source
Expand description

Byte trigrams: the atomic unit of the candidate index.

A trigram is any three consecutive bytes of a file’s contents. A file is represented in the index by the set of distinct trigrams it contains; a query is represented by a boolean formula over trigrams that every matching file must satisfy. See docs/index-and-storage.md.

Functions§

distinct
The set of distinct trigrams in bytes. A BTreeSet keeps results deterministic for tests; the indexer will use a faster set.
for_each
Invoke f once per (not-necessarily-distinct) trigram window over bytes.
of_literal
The trigrams of a literal byte string, in order (with repeats). Empty if lit.len() < 3.
pack
Pack a trigram into the low 24 bits of a u32, for compact keys.

Type Aliases§

Trigram
A single trigram. Three raw bytes; works for arbitrary (incl. non-UTF-8) content.