Struct tantivy_fst::map::MapBuilder

source ·
pub struct MapBuilder<W>(/* private fields */);
Expand description

A builder for creating a map.

This is not your average everyday builder. It has two important qualities that make it a bit unique from what you might expect:

  1. All keys must be added in lexicographic order. Adding a key out of order will result in an error. Additionally, adding a duplicate key will also result in an error. That is, once a key is associated with a value, that association can never be modified or deleted.
  2. The representation of a map is streamed to any io::Write as it is built. For an in memory representation, this can be a Vec<u8>.

Point (2) is especially important because it means that a map can be constructed without storing the entire map in memory. Namely, since it works with any io::Write, it can be streamed directly to a file.

With that said, the builder does use memory, but memory usage is bounded to a constant size. The amount of memory used trades off with the compression ratio. Currently, the implementation hard codes this trade off which can result in about 5-20MB of heap usage during construction. (N.B. Guaranteeing a maximal compression ratio requires memory proportional to the size of the map, which defeats some of the benefit of streaming it to disk. In practice, a small bounded amount of memory achieves close-to-minimal compression ratios.)

The algorithmic complexity of map construction is O(n) where n is the number of elements added to the map.

Example: build in memory

This shows how to use the builder to construct a map in memory. Note that Map::from_iter provides a convenience function that achieves this same goal without needing to explicitly use MapBuilder.

use tantivy_fst::{IntoStreamer, Streamer, Map, MapBuilder};

let mut build = MapBuilder::memory();
build.insert("bruce", 1).unwrap();
build.insert("clarence", 2).unwrap();
build.insert("stevie", 3).unwrap();

// You could also call `finish()` here, but since we're building the map in
// memory, there would be no way to get the `Vec<u8>` back.
let bytes = build.into_inner().unwrap();

// At this point, the map has been constructed, but here's how to read it.
let map = Map::from_bytes(bytes).unwrap();
let mut stream = map.into_stream();
let mut kvs = vec![];
while let Some((k, v)) = stream.next() {
    kvs.push((k.to_vec(), v));
}
assert_eq!(kvs, vec![
    (b"bruce".to_vec(), 1),
    (b"clarence".to_vec(), 2),
    (b"stevie".to_vec(), 3),
]);

Implementations§

source§

impl MapBuilder<Vec<u8>>

source

pub fn memory() -> Self

Create a builder that builds a map in memory.

source§

impl<W: Write> MapBuilder<W>

source

pub fn new(wtr: W) -> Result<MapBuilder<W>>

Create a builder that builds a map by writing it to wtr in a streaming fashion.

source

pub fn insert<K: AsRef<[u8]>>(&mut self, key: K, val: u64) -> Result<()>

Insert a new key-value pair into the map.

Keys must be convertible to byte strings. Values must be a u64, which is a restriction of the current implementation of finite state transducers. (Values may one day be expanded to other types.)

If a key is inserted that is less than or equal to any previous key added, then an error is returned. Similarly, if there was a problem writing to the underlying writer, an error is returned.

source

pub fn extend_iter<K, I>(&mut self, iter: I) -> Result<()>where K: AsRef<[u8]>, I: IntoIterator<Item = (K, u64)>,

Calls insert on each item in the iterator.

If an error occurred while adding an element, processing is stopped and the error is returned.

If a key is inserted that is less than or equal to any previous key added, then an error is returned. Similarly, if there was a problem writing to the underlying writer, an error is returned.

source

pub fn extend_stream<'f, I, S>(&mut self, stream: I) -> Result<()>where I: for<'a> IntoStreamer<'a, Into = S, Item = (&'a [u8], u64)>, S: 'f + for<'a> Streamer<'a, Item = (&'a [u8], u64)>,

Calls insert on each item in the stream.

Note that unlike extend_iter, this is not generic on the items in the stream.

If a key is inserted that is less than or equal to any previous key added, then an error is returned. Similarly, if there was a problem writing to the underlying writer, an error is returned.

source

pub fn finish(self) -> Result<()>

Finishes the construction of the map and flushes the underlying writer. After completion, the data written to W may be read using one of Map’s constructor methods.

source

pub fn into_inner(self) -> Result<W>

Just like finish, except it returns the underlying writer after flushing it.

source

pub fn get_ref(&self) -> &W

Gets a reference to the underlying writer.

source

pub fn bytes_written(&self) -> u64

Returns the number of bytes written to the underlying writer

Auto Trait Implementations§

§

impl<W> RefUnwindSafe for MapBuilder<W>where W: RefUnwindSafe,

§

impl<W> Send for MapBuilder<W>where W: Send,

§

impl<W> Sync for MapBuilder<W>where W: Sync,

§

impl<W> Unpin for MapBuilder<W>where W: Unpin,

§

impl<W> UnwindSafe for MapBuilder<W>where W: UnwindSafe,

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.