Crate nutrimatic[][src]

An API for reading Nutrimatic index files.

See the Nutrimatic source code for a full description of the file format or instructions for creating an index file.

An index file is taken in as a &[u8] containing the contents of the file; typically, this will be created by memory-mapping a file on disk (of course, it also works fine to read an index file fully into memory if it fits).

An index file describes a trie of strings; edges are labeled with characters (ASCII space, digits, and letters) and each node stores the total frequency in some corpus of all phrases starting with the sequence of characters leading up to the node.

This library does no consistency checking of index files. If you attempt to use an invalid file, you will see random panics or garbage results (but no unsafety). Don’t do that!

Examples

use nutrimatic::Node;

// Collect all phrases in the trie in alphabetical order along with their
// frequencies.
fn collect(node: &Node, word: &mut String, out: &mut Vec<(String, u64)>) {
    for child in &node.children() {
        // The space indicates that this transition corresponds to a word
        // boundary.
        if child.ch() == ' ' as u8 {
            out.push((word.clone(), child.freq()));
        }
        word.push(child.ch() as char);
        collect(&child, word, out);
        word.pop();
    }
}

fn main() {
    // This buffer describes a trie containing the words "ru" and "st"; a
    // trie would normally be generated ahead of time by external tools. The
    // byte values are written a bit oddly to hint at each one's purpose in
    // the serialization.
    let buf: &[u8] = &[
        ' ' as u8, 17, 0x00 | 1,
        'u' as u8, 17, 0, 0x80 | 1,
        ' ' as u8, 18, 0x00 | 1,
        't' as u8, 18, 0, 0x80 | 1,
        'r' as u8, 17, 7, 's' as u8, 18, 0, 0x80 | 2,
    ];

    let root = Node::new(buf);

    let mut words = vec![];
    collect(&root, &mut String::new(), &mut words);
    assert_eq!(words, vec![("ru".to_owned(), 17), ("st".to_owned(), 18)]);
}

Structs

ChildIter

An iterator over the children of a node.

ChildReader

A lazy reader of the children of a node.

Node

A node in a trie.

ThinNode

A “thin” representation of a node in a trie.

Enums

SearchResult

The result of searching for a sequence of characters.