Expand description
Byte trigrams: the atomic unit of the candidate index.
A trigram is any three consecutive bytes of a file’s contents. A file is represented in the
index by the set of distinct trigrams it contains; a query is represented by a boolean
formula over trigrams that every matching file must satisfy. See docs/index-and-storage.md.
Functions§
- distinct
- The set of distinct trigrams in
bytes. ABTreeSetkeeps results deterministic for tests; the indexer will use a faster set. - for_
each - Invoke
fonce per (not-necessarily-distinct) trigram window overbytes. - of_
literal - The trigrams of a literal byte string, in order (with repeats). Empty if
lit.len() < 3. - pack
- Pack a trigram into the low 24 bits of a
u32, for compact keys.
Type Aliases§
- Trigram
- A single trigram. Three raw bytes; works for arbitrary (incl. non-UTF-8) content.