[][src]Struct fst::map::Map

pub struct Map<D>(_);

Map is a lexicographically ordered map from byte strings to integers.

A Map is constructed with the MapBuilder type. Alternatively, a Map can be constructed in memory from a lexicographically ordered iterator of key-value pairs (Map::from_iter).

A key feature of Map is that it can be serialized to disk compactly. Its underlying representation is built such that the Map can be memory mapped and searched without necessarily loading the entire map into memory.

It supports most common operations associated with maps, such as key lookup and search. It also supports set operations on its keys along with the ability to specify how conflicting values are merged together. Maps also support range queries and automata based searches (e.g. a regular expression).

Maps are represented by a finite state transducer where inputs are the keys and outputs are the values. As such, maps have the following invariants:

  1. Once constructed, a Map can never be modified.
  2. Maps must be constructed with lexicographically ordered byte sequences. There is no restricting on the ordering of values.

Differences with sets

Maps and sets are represented by the same underlying data structure: the finite state transducer. The principal difference between them is that sets always have their output values set to 0. This has an impact on the representation size and is reflected in the type system for convenience. A secondary but subtle difference is that duplicate values can be added to a set, but it is an error to do so with maps. That is, a set can have the same key added sequentially, but a map can't.

The future

It is regrettable that the output value is fixed to u64. Indeed, it is not necessary, but it was a major simplification in the implementation. In the future, the value type may become generic to an extent (outputs must satisfy a basic algebra).

Keys will always be byte strings; however, we may grow more conveniences around dealing with them (such as a serialization/deserialization step, although it isn't clear where exactly this should live).

Implementations

impl Map<Vec<u8>>[src]

pub fn from_iter<K, I>(iter: I) -> Result<Map<Vec<u8>>> where
    K: AsRef<[u8]>,
    I: IntoIterator<Item = (K, u64)>, 
[src]

Create a Map from an iterator of lexicographically ordered byte strings and associated values.

If the iterator does not yield unique keys in lexicographic order, then an error is returned.

Note that this is a convenience function to build a map in memory. To build a map that streams to an arbitrary io::Write, use MapBuilder.

impl<D: AsRef<[u8]>> Map<D>[src]

pub fn new(data: D) -> Result<Map<D>>[src]

Creates a map from its representation as a raw byte sequence.

This accepts anything that can be cheaply converted to a &[u8]. The caller is responsible for guaranteeing that the given bytes refer to a valid FST. While memory safety will not be violated by invalid input, a panic could occur while reading the FST at any point.

Example

use fst::Map;

// File written from a build script using MapBuilder.
static FST: &[u8] = include_bytes!(concat!(env!("OUT_DIR"), "/map.fst"));

let map = Map::new(FST).unwrap();

pub fn contains_key<K: AsRef<[u8]>>(&self, key: K) -> bool[src]

Tests the membership of a single key.

Example

use fst::Map;

let map = Map::from_iter(vec![("a", 1), ("b", 2), ("c", 3)]).unwrap();

assert_eq!(map.contains_key("b"), true);
assert_eq!(map.contains_key("z"), false);

pub fn get<K: AsRef<[u8]>>(&self, key: K) -> Option<u64>[src]

Retrieves the value associated with a key.

If the key does not exist, then None is returned.

Example

use fst::Map;

let map = Map::from_iter(vec![("a", 1), ("b", 2), ("c", 3)]).unwrap();

assert_eq!(map.get("b"), Some(2));
assert_eq!(map.get("z"), None);

pub fn stream(&self) -> Stream[src]

Return a lexicographically ordered stream of all key-value pairs in this map.

While this is a stream, it does require heap space proportional to the longest key in the map.

If the map is memory mapped, then no further heap space is needed. Note though that your operating system may fill your page cache (which will cause the resident memory usage of the process to go up correspondingly).

Example

Since streams are not iterators, the traditional for loop cannot be used. while let is useful instead:

use fst::{IntoStreamer, Streamer, Map};

let map = Map::from_iter(vec![("a", 1), ("b", 2), ("c", 3)]).unwrap();
let mut stream = map.stream();

let mut kvs = vec![];
while let Some((k, v)) = stream.next() {
    kvs.push((k.to_vec(), v));
}
assert_eq!(kvs, vec![
    (b"a".to_vec(), 1),
    (b"b".to_vec(), 2),
    (b"c".to_vec(), 3),
]);

pub fn keys(&self) -> Keys[src]

Return a lexicographically ordered stream of all keys in this map.

Memory requirements are the same as described on Map::stream.

Example

use fst::{IntoStreamer, Streamer, Map};

let map = Map::from_iter(vec![("a", 1), ("b", 2), ("c", 3)]).unwrap();
let mut stream = map.keys();

let mut keys = vec![];
while let Some(k) = stream.next() {
    keys.push(k.to_vec());
}
assert_eq!(keys, vec![b"a", b"b", b"c"]);

pub fn values(&self) -> Values[src]

Return a stream of all values in this map ordered lexicographically by each value's corresponding key.

Memory requirements are the same as described on Map::stream.

Example

use fst::{IntoStreamer, Streamer, Map};

let map = Map::from_iter(vec![("a", 1), ("b", 2), ("c", 3)]).unwrap();
let mut stream = map.values();

let mut values = vec![];
while let Some(v) = stream.next() {
    values.push(v);
}
assert_eq!(values, vec![1, 2, 3]);

pub fn range(&self) -> StreamBuilder[src]

Return a builder for range queries.

A range query returns a subset of key-value pairs in this map in a range given in lexicographic order.

Memory requirements are the same as described on Map::stream. Notably, only the keys in the range are read; keys outside the range are not.

Example

Returns only the key-value pairs in the range given.

use fst::{IntoStreamer, Streamer, Map};

let map = Map::from_iter(vec![
    ("a", 1), ("b", 2), ("c", 3), ("d", 4), ("e", 5),
]).unwrap();
let mut stream = map.range().ge("b").lt("e").into_stream();

let mut kvs = vec![];
while let Some((k, v)) = stream.next() {
    kvs.push((k.to_vec(), v));
}
assert_eq!(kvs, vec![
    (b"b".to_vec(), 2),
    (b"c".to_vec(), 3),
    (b"d".to_vec(), 4),
]);

pub fn search<A: Automaton>(&self, aut: A) -> StreamBuilder<A>[src]

Executes an automaton on the keys of this map.

Note that this returns a StreamBuilder, which can be used to add a range query to the search (see the range method).

Memory requirements are the same as described on Map::stream.

Example

An implementation of regular expressions for Automaton is available in the regex-automata crate with the fst1 feature enabled, which can be used to search maps.

Example

An implementation of subsequence search for Automaton can be used to search maps:

use fst::automaton::Subsequence;
use fst::{IntoStreamer, Streamer, Map};

fn example() -> Result<(), Box<dyn std::error::Error>> {
    let map = Map::from_iter(vec![
        ("a foo bar", 1),
        ("foo", 2),
        ("foo1", 3),
        ("foo2", 4),
        ("foo3", 5),
        ("foobar", 6),
    ]).unwrap();

    let matcher = Subsequence::new("for");
    let mut stream = map.search(&matcher).into_stream();

    let mut kvs = vec![];
    while let Some((k, v)) = stream.next() {
        kvs.push((String::from_utf8(k.to_vec())?, v));
    }
    assert_eq!(kvs, vec![
        ("a foo bar".to_string(), 1), ("foobar".to_string(), 6),
    ]);

    Ok(())
}

pub fn search_with_state<A: Automaton>(
    &self,
    aut: A
) -> StreamWithStateBuilder<A>
[src]

Executes an automaton on the keys of this map and yields matching keys along with the corresponding matching states in the given automaton.

Note that this returns a StreamWithStateBuilder, which can be used to add a range query to the search (see the range method).

Memory requirements are the same as described on Map::stream.

Example

An implementation of fuzzy search using Levenshtein automata can be used to search maps:

use fst::automaton::Levenshtein;
use fst::{IntoStreamer, Streamer, Map};

fn example() -> Result<(), Box<dyn std::error::Error>> {
    let map = Map::from_iter(vec![
        ("foo", 1),
        ("foob", 2),
        ("foobar", 3),
        ("fozb", 4),
    ]).unwrap();

    let query = Levenshtein::new("foo", 2)?;
    let mut stream = map.search_with_state(&query).into_stream();

    let mut kvs = vec![];
    while let Some((k, v, s)) = stream.next() {
        kvs.push((String::from_utf8(k.to_vec())?, v, s));
    }
    // Currently, there isn't much interesting that you can do with the states.
    assert_eq!(kvs, vec![
        ("foo".to_string(), 1, Some(183)),
        ("foob".to_string(), 2, Some(123)),
        ("fozb".to_string(), 4, Some(83)),
    ]);

    Ok(())
}

pub fn len(&self) -> usize[src]

Returns the number of elements in this map.

pub fn is_empty(&self) -> bool[src]

Returns true if and only if this map is empty.

pub fn op(&self) -> OpBuilder[src]

Creates a new map operation with this map added to it.

The OpBuilder type can be used to add additional map streams and perform set operations like union, intersection, difference and symmetric difference on the keys of the map. These set operations also allow one to specify how conflicting values are merged in the stream.

Example

This example demonstrates a union on multiple map streams. Notice that the stream returned from the union is not a sequence of key-value pairs, but rather a sequence of keys associated with one or more values. Namely, a key is associated with each value associated with that same key in the all of the streams.

use fst::{Streamer, Map};
use fst::map::IndexedValue;

let map1 = Map::from_iter(vec![
    ("a", 1), ("b", 2), ("c", 3),
]).unwrap();
let map2 = Map::from_iter(vec![
    ("a", 10), ("y", 11), ("z", 12),
]).unwrap();

let mut union = map1.op().add(&map2).union();

let mut kvs = vec![];
while let Some((k, vs)) = union.next() {
    kvs.push((k.to_vec(), vs.to_vec()));
}
assert_eq!(kvs, vec![
    (b"a".to_vec(), vec![
        IndexedValue { index: 0, value: 1 },
        IndexedValue { index: 1, value: 10 },
    ]),
    (b"b".to_vec(), vec![IndexedValue { index: 0, value: 2 }]),
    (b"c".to_vec(), vec![IndexedValue { index: 0, value: 3 }]),
    (b"y".to_vec(), vec![IndexedValue { index: 1, value: 11 }]),
    (b"z".to_vec(), vec![IndexedValue { index: 1, value: 12 }]),
]);

pub fn as_fst(&self) -> &Fst<D>[src]

Returns a reference to the underlying raw finite state transducer.

pub fn into_fst(self) -> Fst<D>[src]

Returns the underlying raw finite state transducer.

pub fn map_data<F, T>(self, f: F) -> Result<Map<T>> where
    F: FnMut(D) -> T,
    T: AsRef<[u8]>, 
[src]

Maps the underlying data of the fst Map to another data type.

Example

This example shows that you can map an fst Map based on a Vec<u8> into an fst Map based on a Cow<[u8]>, it can also work with a reference counted type (e.g. Arc, Rc).

use std::borrow::Cow;

use fst::Map;

let map: Map<Vec<u8>> = Map::from_iter(
    [("hello", 12), ("world", 42)].iter().cloned(),
).unwrap();

let map_on_cow: Map<Cow<[u8]>> = map.map_data(Cow::Owned).unwrap();

Trait Implementations

impl<D: AsRef<[u8]>> AsRef<Fst<D>> for Map<D>[src]

Returns the underlying finite state transducer.

impl<D: Clone> Clone for Map<D>[src]

impl<D: AsRef<[u8]>> Debug for Map<D>[src]

impl Default for Map<Vec<u8>>[src]

impl<D: AsRef<[u8]>> From<Fst<D>> for Map<D>[src]

impl<'m, 'a, D: AsRef<[u8]>> IntoStreamer<'a> for &'m Map<D>[src]

type Item = (&'a [u8], u64)

The type of the item emitted by the stream.

type Into = Stream<'m>

The type of the stream to be constructed.

Auto Trait Implementations

impl<D> RefUnwindSafe for Map<D> where
    D: RefUnwindSafe

impl<D> Send for Map<D> where
    D: Send

impl<D> Sync for Map<D> where
    D: Sync

impl<D> Unpin for Map<D> where
    D: Unpin

impl<D> UnwindSafe for Map<D> where
    D: UnwindSafe

Blanket Implementations

impl<T> Any for T where
    T: 'static + ?Sized
[src]

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

impl<T> From<T> for T[src]

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

impl<T> ToOwned for T where
    T: Clone
[src]

type Owned = T

The resulting type after obtaining ownership.

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.