Manipulate DJB's Constant Database files. These are 2 level disk-based hash tables that efficiently handle thousands of keys, while remaining space-efficient.
Constant databases have the desirable property of requiring low overhead to open.
cdb is a fast, reliable, simple package for creating and reading constant databases. Its database structure provides several features:
- Fast lookups: A successful lookup in a large database normally takes just two disk accesses. An unsuccessful lookup takes only one.
- Low overhead: A database uses 2048 bytes, plus 24 bytes per record, plus the space for keys and data.
- No random limits: cdb can handle any database up to 4 gigabytes. There are no other restrictions; records don't even have to fit into memory. Databases are stored in a machine-independent format.
- Fast atomic database replacement: cdbmake can rewrite an entire database two orders of magnitude faster than other hashing packages.
- Fast database dumps: cdbdump prints the contents of a database in cdbmake-compatible format.
cdb is designed to be used in mission-critical applications like e-mail. Database replacement is safe against system crashes. Readers don't have to pause during a rewrite.
D. J. Bernstein
A cdb is an associative array: it maps strings (
keys) to strings (
A cdb contains 256 pointers to linearly probed open hash tables. The hash tables contain pointers to
(key, data)pairs. A cdb is stored in a single file on disk:
+----------------+---------+-------+-------+-----+---------+ | p0 p1 ... p255 | records | hash0 | hash1 | ... | hash255 | +----------------+---------+-------+-------+-----+---------+
Each of the 256 initial pointers states a position and a length. The position is the starting byte position of the hash table. The length is the number of slots in the hash table.
Records are stored sequentially, without special alignment. A record states a key length, a data length, the key, and the data.
Each hash table slot states a hash value and a byte position. If the byte position is 0, the slot is empty. Otherwise, the slot points to a record whose key has that hash value.
Positions, lengths, and hash values are 32-bit quantities, stored in little-endian form in 4 bytes. Thus a cdb must fit into 4 gigabytes.
A record is located as follows. Compute the hash value of the key in the record. The hash value modulo 256 is the number of a hash table. The hash value divided by 256, modulo the length of that table, is a slot number. Probe that slot, the next higher slot, and so on, until you find the record or run into an empty slot.
The cdb hash function is
h = ((h << 5) + h) ^ c, with a starting hash of
To write to a new CDB:
use galvanize::Writer; use std::fs::File; let key = "key".as_bytes(); let value = "value".as_bytes(); let mut f = try!(File::create(filename)); let mut cdb_writer = try!(Writer::new(&mut f)); try!(cdb_writer.put(key, value));
To read from an existing CDB:
use galvanize::Reader; let mut f = try!(File::open(filename)); let mut cdb_reader = try!(Reader::new(&mut f)); let stored_vals = cdb_reader.get(key); assert_eq!(stored_vals.len(), 1); assert_eq!(&stored_vals[..], &value[..]); // "value".as_bytes() // The CDB contains only one entry: assert_eq!(cdb_reader.len(), 1); let non_existing_key = "non_existing_key".as_bytes(); let empty = cdb_reader.get(non_existing_key); assert_eq!(empty.len(), 0); // Accessing a key that isn't in the CDB: let non_existing_key = "non_existing_key".as_bytes(); let empty = cdb_reader.get(non_existing_key); assert_eq!(empty.len(), 0); assert!(cdb_reader.get_first(non_existing_key).is_err());
Various functions that are used across both the writer and reader modules.
This module allows you to read from a CDB.
This module contains the possible responses from the public interfaces in this crate.
This module allows you to write to a CDB.