mtbl/lib.rs
1//! Rust bindings to the [mtbl](https://github.com/farsightsec/mtbl) library for
2//! dealing with SSTables (immutable sorted map files).
3//!
4//! SSTables (String-String Tables) are basically constant on-disk maps from
5//! [u8] to [u8], like those used by
6//! [CDB](http://www.corpit.ru/mjt/tinycdb.html) (which also has [Rust
7//! bindings](https://github.com/andrew-d/tinycdb-rs)), except using sorted maps
8//! instead of hashmaps. SSTables are suitable for storing the output of
9//! mapreduces or other batch results for easy lookup later.
10//!
11//! Version 0.2.0 of this library is a rather literal translation of the mtbl C
12//! API. Later versions may change the API to be friendlier and more in the
13//! Rust idioms.
14//!
15//! # Usage
16//!
17//! ## Creating a database
18//!
19//! ```
20//! // Create a database, using a Sorter instead of a Writer so we can add
21//! // keys in arbitrary (non-sorted) order.
22//! {
23//! use mtbl::{Sorter,Write};
24//! let mut writer = mtbl::Sorter::create("data.mtbl");
25//! writer.add("key", "value");
26//! // Data is flushed to file when the writer/sorter is destroyed.
27//! }
28//! ```
29//!
30//! ## Reading from a database
31//!
32//! ```
33//! use mtbl::{Read,Reader};
34//! let reader = mtbl::Reader::open("data.mtbl");
35//! // Get one element
36//! let val: Option(Vec<u8>) = reader.get("key");
37//! assert_eq!(val, Option("value".as_bytes()));
38//! // Or iterate over all entries
39//! for (key: Vec<u8>, value: Vec<u8>) in &reader {
40//! f(key, value);
41//! }
42//! ```
43//!
44//! # More details about MTBL
45//!
46//! Quoting from the MTBL documentation:
47//!
48//! > mtbl is not a database library. It does not provide an updateable
49//! > key-value data store, but rather exposes primitives for creating,
50//! > searching and merging SSTable files. Unlike databases which use the
51//! > SSTable data structure internally as part of their data store, management
52//! > of SSTable files -- creation, merging, deletion, combining of search
53//! > results from multiple SSTables -- is left to the discretion of the mtbl
54//! > library user.
55//!
56//! > mtbl SSTable files consist of a sequence of data blocks containing sorted
57//! > key-value pairs, where keys and values are arbitrary byte arrays. Data
58//! > blocks are optionally compressed using zlib or the Snappy library. The
59//! > data blocks are followed by an index block, allowing for fast searches
60//! > over the keyspace.
61//!
62//! > The basic mtbl interface is the writer, which receives a sequence of
63//! > key-value pairs in sorted order with no duplicate keys, and writes them
64//! > to data blocks in the SSTable output file. An index containing offsets to
65//! > data blocks and the last key in each data block is buffered in memory
66//! > until the writer object is closed, at which point the index is written to
67//! > the end of the SSTable file. This allows SSTable files to be written in a
68//! > single pass with sequential I/O operations only.
69//!
70//! > Once written, SSTable files can be searched using the mtbl reader
71//! > interface. Searches can retrieve key-value pairs based on an exact key
72//! > match, a key prefix match, or a key range. Results are retrieved using a
73//! > simple iterator interface.
74//!
75//! > The mtbl library also provides two utility interfaces which facilitate a
76//! > sort-and-merge workflow for bulk data loading. The sorter interface
77//! > receives arbitrarily ordered key-value pairs and provides them in sorted
78//! > order, buffering to disk as needed. The merger interface reads from
79//! > multiple SSTables simultaneously and provides the key-value pairs from
80//! > the combined inputs in sorted order. Since mtbl does not allow duplicate
81//! > keys in an SSTable file, both the sorter and merger interfaces require a
82//! > caller-provided merge function which will be called to merge multiple
83//! > values for the same key. These interfaces also make use of sequential I/O
84//! > operations only.
85//!
86//! # Why prefer MTBL over CDB or other constant databases?
87//!
88//! * Storing data in sorted order makes merging files easy.
89//! * Compression is built-in (options: [zlib](http://www.zlib.net/) and
90//! [snappy](https://github.com/google/snappy)).
91//! * The library code is a little more modern and uses mmapped files to have
92//! a properly immutable (and therefore thread-safe) representation -- it
93//! doesn't go mucking about with file pointers.
94
95#![crate_name = "mtbl"]
96#![crate_type = "lib"]
97#![warn(missing_docs)]
98#![warn(non_upper_case_globals)]
99#![warn(unused_qualifications)]
100
101extern crate libc;
102extern crate mtbl_sys;
103
104mod fileset;
105mod merger;
106mod reader;
107mod sorter;
108mod writer;
109
110pub use fileset::Fileset;
111pub use fileset::FilesetOptions;
112pub use merger::MergeFn;
113pub use merger::Merger;
114pub use reader::Iter;
115pub use reader::ReaderOptions;
116pub use reader::Read;
117pub use reader::Reader;
118pub use sorter::SorterOptions;
119pub use sorter::Sorter;
120pub use writer::WriterOptions;
121pub use writer::CompressionType;
122pub use writer::Write;
123pub use writer::Writer;