| BlockBuilder generates blocks where keys are
| prefix-compressed:
|
| When we store a key, we drop the prefix shared
| with the previous string. This helps reduce
| the space requirement significantly.
| Furthermore, once every K keys, we do not apply
| the prefix compression and store the entire
| key. We call this a “restart point”. The tail
| end of the block stores the offsets of all of
| the restart points, and can be used to do
| a binary search when looking for a particular
| key. Values are stored as-is (without
| compression) immediately following the
| corresponding key.
|
| An entry for a particular key-value pair has the form:
| shared_bytes: varint32
| unshared_bytes: varint32
| value_length: varint32
| key_delta: char[unshared_bytes]
| value: char[value_length]
| shared_bytes == 0 for restart points.
|
| The trailer of the block has the form:
| restarts: uint32[num_restarts]
| num_restarts: uint32
|
| restarts[i] contains the offset within the
| block of the ith restart point.
| Memtables and sstables that make the DB
| representation contain (userkey,seq,type) =>
| uservalue entries. DBIter combines multiple
| entries for the same userkey found in the DB
| representation into a single entry while
| accounting for sequence numbers, deletion
| markers, overwrites, etc.
| An implementation of Env that forwards all
| calls to another Env.
|
| May be useful to clients who wish to override
| just part of the functionality of another Env.
| A FilterBlockBuilder is used to construct all
| of the filters for a particular Table. It
| generates a single string which is stored as
| a special block in the Table.
|
| The sequence of calls to FilterBlockBuilder
| must match the regexp: (StartBlock
| AddKey*)* Finish
| We provide our own simple hash table since it
| removes a whole bunch of porting hacks and is
| also faster than some of the built-in hash
| table implementations in some of the
| compiler/runtime combinations we have tested.
| E.g., readrandom speeds up by ~5% over the g++
| 4.4.3’s builtin hashtable.
| A internal wrapper class with an interface
| similar to Iterator that caches the valid() and
| key() results for an underlying iterator.
|
| This can help avoid virtual function calls and
| also gives better cache locality.
| Helper class to limit resource usage to avoid
| exhaustion.
|
| Currently used to limit read-only file
| descriptors and mmap file usage so that we do
| not run out of file descriptors or virtual
| memory, or run into kernel performance problems
| for very large databases.
| Tracks the files locked by
| PosixEnv::LockFile().
|
| We maintain a separate set instead of relying
| on fcntl(F_SETLK) because fcntl(F_SETLK) does
| not provide any protection against multiple
| uses from the same process.
|
| Instances are thread-safe because all member
| data is guarded by a mutex.
| Implements random read access in a file using
| mmap().
|
| Instances of this class are thread-safe, as
| required by the RandomAccessFile API. Instances
| are immutable and Read() only calls thread-safe
| library functions.
| Implements random read access in a file using
| pread().
|
| Instances of this class are thread-safe, as
| required by the RandomAccessFile API. Instances
| are immutable and Read() only calls thread-safe
| library functions.
| Implements sequential read access in a file
| using read().
|
| Instances of this class are thread-friendly but
| not thread-safe, as required by the
| SequentialFile API.
| Slice is a simple structure containing
| a pointer into some external storage and
| a size. The user of a Slice must ensure that
| the slice is not used after the corresponding
| external storage has been deallocated.
|
| Multiple threads can invoke const methods on
| a Slice without external synchronization, but
| if any of the threads may call a non-const
| method, all threads accessing the same Slice
| must use external synchronization.
| A Table is a sorted map from strings to
| strings. Tables are immutable and persistent.
| A Table may be safely accessed from multiple
| threads without external synchronization.
| TableBuilder provides the interface used to
| build a Table (an immutable and sorted map from
| keys to values).
|
| Multiple threads can invoke const methods on
| a TableBuilder without external
| synchronization, but if any of the threads may
| call a non-const method, all threads accessing
| the same TableBuilder must use external
| synchronization.
| An internal iterator. For a given
| version/level pair, yields information about
| the files in the level. For a given entry,
| key() is the largest key that occurs in the
| file, and value() is an 16-byte value
| containing the file number and file size, both
| encoded using EncodeFixed64.
| DB contents are stored in a set of blocks, each
| of which holds a sequence of key,value pairs.
| Each block may be compressed before being
| stored in a file. The following enum describes
| which compression method (if any) is used to
| compress a block.
| Which direction is the iterator currently
| moving?
|
| (1) When moving forward, the internal
| iterator is positioned at the exact entry
| that yields this->key(), this->value()
|
| (2) When moving backwards, the internal
| iterator is positioned just before all
| entries whose user key == this->key().
| A DB is a persistent ordered map from keys to
| values.
|
| A DB is safe for concurrent access from
| multiple threads without any external
| synchronization.
| in the c++, we had the following annotated
| empty Default impl:
|
| Return a default environment suitable for the
| current operating system. Sophisticated
| users may wish to provide their own Env
| implementation instead of relying on this
| default environment.
|
| The result of Default() belongs to leveldb
| and must never be deleted.
| A Comparator object provides a total order
| across slices that are used as keys in an
| sstable or a database. A Comparator
| implementation must be thread-safe since
| leveldb may invoke its methods concurrently
| from multiple threads.
| Abstract handle to particular state of a DB.
|
| A Snapshot is an immutable object and can
| therefore be safely accessed from multiple
| threads without any external synchronization.
| A file abstraction for sequential writing. The
| implementation must provide buffering since
| callers may append small fragments at a time to
| the file.
| Extracts the largest file b1 from
| |compaction_files| and then searches for a b2
| in |level_files| for which user_key(u1)
| = user_key(l2). If it finds such a file b2
| (known as a boundary file) it adds it to
| |compaction_files| and then searches again
| using this new upper bound.
|
| If there are two blocks, b1=(l1, u1) and
| b2=(l2, u2) and user_key(u1) = user_key(l2),
| and if we compact b1 but not b2 then
| a subsequent get operation will yield an
| incorrect result because it will return the
| record from b2 in level i rather than from b1
| because it searches level by level for records
| matching the supplied user key.
|
| parameters:
|
| in level_files: List of files to
| search for boundary files.
|
| in/out compaction_files: List of files to
| extend by adding boundary files.
| Build a Table file from the contents
| of *iter.
|
| The generated file will be named according
| to meta->number.
|
| On success, the rest of *meta will be
| filled with metadata about the generated
| table.
|
| If no data is present in *iter, meta->file_size
| will be set to zero, and no Table file
| will be produced.
|
| Store in dst a string of length “len” that
| will compress to “Ncompressed_fraction” bytes
| and return a Slice that references the
| generated data.
| Parse a human-readable number from “*in” into
| *value. On success, advances “*in” past the
| consumed number and sets “*val” to the numeric
| value. Otherwise, returns false and leaves *in
| in an unspecified state.
| Return the crc32c of concat(A, data[0,n-1])
| where init_crc is the crc32c of some string A.
| Extend() is often used to maintain the crc32c
| of a stream of data.
| Return a masked representation of crc.
|
| Motivation: it is problematic to compute the
| CRC of a string that contains embedded CRCs.
| Therefore we recommend that CRCs stored
| somewhere (e.g., in files) should be masked
| before being stored.
| Helper routine: decode the next block entry
| starting at “p”, storing the number of shared
| key bytes, non_shared key bytes, and the length
| of the value in “*shared”, “*non_shared”, and
| “*value_length”, respectively. Will not
| dereference past “limit”.
|
| If any errors are detected, returns nullptr.
| Otherwise, returns a pointer to the key delta
| (just past the three decoded values).
| Return the name of the descriptor file for the
| db named by “dbname” and the specified
| incarnation number. The result will be
| prefixed with “dbname”.
| Encode a suitable internal key target for
| “target” and return it.
|
| Uses *scratch as scratch space, and the
| returned pointer will point into this scratch
| space.
| Lower-level versions of Put… that write
| directly into a character buffer and return
| a pointer just past the last byte written.
|
| REQUIRES: dst has enough space for the value
| being written
| Maximum number of bytes in all compacted files.
| We avoid expanding the lower level file set of
| a compaction if it would make the total
| compaction cover more than this many bytes.
| Return the smallest index i such that
| files[i]->largest >= key.
|
| Return files.size() if there is no such file.
|
| REQUIRES: “files” contains a sorted list of
| non-overlapping files.
| Pointer-based variants of GetVarint… These
| either store a value in *v and return a pointer
| just past the parsed value, or return nullptr
| on error. These routines only look at bytes in
| the range [p..limit-1]
| Return a new filter policy that uses a bloom
| filter with approximately the specified number
| of bits per key. A good value for bits_per_key
| is 10, which yields a filter with ~ 1% false
| positive rate.
|
| Callers must delete the result after any
| database that is using the result has been
| closed.
|
| Note: if you are using a custom comparator that
| ignores some parts of the keys being compared,
| you must not use NewBloomFilterPolicy() and
| must provide your own FilterPolicy that also
| ignores the corresponding parts of the keys.
| For example, if the comparator ignores trailing
| spaces, it would be incorrect to use
| a FilterPolicy (like NewBloomFilterPolicy) that
| does not ignore trailing spaces in keys.
| Return a new iterator that converts internal
| keys (yielded by “*internal_iter”) that were
| live at the specified “sequence” number into
| appropriate user keys.
| Returns a new environment that stores its data
| in memory and delegates all non-file-storage
| tasks to base_env. The caller must delete the
| result when it is no longer needed. *base_env
| must remain live while the result is in use.
| Return an iterator that provided the union of
| the data in children[0,n-1]. Takes ownership
| of the child iterators and will delete them
| when the result iterator is deleted.
|
| The result does no duplicate suppression.
| I.e., if a particular key is present in K child
| iterators, it will be yielded K times.
|
| REQUIRES: n >= 0
| Return a new two level iterator. A two-level
| iterator contains an index iterator whose
| values point to a sequence of blocks where each
| block is itself a sequence of key,value pairs.
| The returned two-level iterator yields the
| concatenation of all key/value pairs in the
| sequence of blocks. Takes ownership of
| “index_iter” and will delete it when no longer
| needed.
|
| Uses a supplied function to convert an
| index_iter value into an iterator over the
| contents of the corresponding block.
| If filename is a leveldb file, store the type
| of the file in *type.
|
| The number encoded in the filename is stored in
| *number. If the filename was successfully
| parsed, returns true. Else return false.
|
| Return a randomization seed for this run.
| Typically returns the same number on repeated
| invocations of this binary, but automated runs
| may be able to vary the seed.
| Run some of the tests registered by the TEST()
| macro. If the environment variable
| “LEVELDB_TESTS” is not set, runs all tests.
|
| Otherwise, runs only the tests whose name
| contains the value of “LEVELDB_TESTS” as
| a substring. E.g., suppose the tests are:
|
| TEST(Foo, Hello) { … }
| TEST(Foo, World) { … }
|
| LEVELDB_TESTS=Hello will run the first test
| LEVELDB_TESTS=o will run both tests
| LEVELDB_TESTS=Junk will run no tests
|
| Returns 0 if all tests pass.
|
| Dies or returns a non-zero value if some test
| fails.
| Returns true iff some file in “files” overlaps
| the user key range [*smallest,*largest].
|
| smallest==nullptr represents a key smaller than
| all keys in the DB.
|
| largest==nullptr represents a key largest than
| all keys in the DB.
|
| REQUIRES: If disjoint_sorted_files, files[]
| contains disjoint ranges in sorted
| order.