Struct fst::SetBuilder [] [src]

pub struct SetBuilder<W>(_);

A builder for creating a set.

This is not your average everyday builder. It has two important qualities that make it a bit unique from what you might expect:

  1. All keys must be added in lexicographic order. Adding a key out of order will result in an error.
  2. The representation of a set is streamed to any io::Write as it is built. For an in memory representation, this can be a Vec<u8>.

Point (2) is especially important because it means that a set can be constructed without storing the entire set in memory. Namely, since it works with any io::Write, it can be streamed directly to a file.

With that said, the builder does use memory, but memory usage is bounded to a constant size. The amount of memory used trades off with the compression ratio. Currently, the implementation hard codes this trade off which can result in about 5-20MB of heap usage during construction. (N.B. Guaranteeing a maximal compression ratio requires memory proportional to the size of the set, which defeats the benefit of streaming it to disk. In practice, a small bounded amount of memory achieves close-to-minimal compression ratios.)

The algorithmic complexity of set construction is O(n) where n is the number of elements added to the set.

Example: build in memory

This shows how to use the builder to construct a set in memory. Note that Set::from_iter provides a convenience function that achieves this same goal without needing to explicitly use SetBuilder.

use fst::{IntoStreamer, Streamer, Set, SetBuilder};

let mut build = SetBuilder::memory();
build.insert("bruce").unwrap();
build.insert("clarence").unwrap();
build.insert("stevie").unwrap();

// You could also call `finish()` here, but since we're building the set in
// memory, there would be no way to get the `Vec<u8>` back.
let bytes = build.into_inner().unwrap();

// At this point, the set has been constructed, but here's how to read it.
let set = Set::from_bytes(bytes).unwrap();
let mut stream = set.into_stream();
let mut keys = vec![];
while let Some(key) = stream.next() {
    keys.push(key.to_vec());
}
assert_eq!(keys, vec![
    "bruce".as_bytes(), "clarence".as_bytes(), "stevie".as_bytes(),
]);

Example: stream to file

This shows how to stream construction of a set to a file.

use std::fs::File;
use std::io;

use fst::{IntoStreamer, Streamer, Set, SetBuilder};

let mut wtr = io::BufWriter::new(File::create("set.fst").unwrap());
let mut build = SetBuilder::new(wtr).unwrap();
build.insert("bruce").unwrap();
build.insert("clarence").unwrap();
build.insert("stevie").unwrap();

// If you want the writer back, then call `into_inner`. Otherwise, this
// will finish construction and call `flush`.
build.finish().unwrap();

// At this point, the set has been constructed, but here's how to read it.
let set = unsafe { Set::from_path("set.fst").unwrap() };
let mut stream = set.into_stream();
let mut keys = vec![];
while let Some(key) = stream.next() {
    keys.push(key.to_vec());
}
assert_eq!(keys, vec![
    "bruce".as_bytes(), "clarence".as_bytes(), "stevie".as_bytes(),
]);

Methods

impl SetBuilder<Vec<u8>>
[src]

[src]

Create a builder that builds a set in memory.

impl<W: Write> SetBuilder<W>
[src]

[src]

Create a builder that builds a set by writing it to wtr in a streaming fashion.

[src]

Insert a new key into the set.

If a key is inserted that is less than any previous key added, then an error is returned. Similarly, if there was a problem writing to the underlying writer, an error is returned.

[src]

Calls insert on each item in the iterator.

If an error occurred while adding an element, processing is stopped and the error is returned.

[src]

Calls insert on each item in the stream.

Note that unlike extend_iter, this is not generic on the items in the stream.

[src]

Finishes the construction of the set and flushes the underlying writer. After completion, the data written to W may be read using one of Set's constructor methods.

[src]

Just like finish, except it returns the underlying writer after flushing it.

[src]

Gets a reference to the underlying writer.

[src]

Returns the number of bytes written to the underlying writer