pub struct DB {
options: DbOptions,
db_lock: Option<FileLock>,
memtable_ptr: ArcSwap<Box<dyn MemTable>>,
wal: AtomicPtr<UnsafeCell<LogWriter>>,
table_cache: Arc<TableCache>,
guarded_fields: Arc<Mutex<GuardedDbFields>>,
file_name_handler: Arc<FileNameHandler>,
is_shutting_down: Arc<AtomicBool>,
has_immutable_memtable: Arc<AtomicBool>,
compaction_worker: Arc<CompactionWorker>,
background_work_finished_signal: Arc<Condvar>,
}Expand description
The primary database object that exposes the public API.
Fields§
§options: DbOptionsOptions for configuring the operation of the database.
db_lock: Option<FileLock>A lock over the persistent (i.e. on disk) state of the database.
memtable_ptr: ArcSwap<Box<dyn MemTable>>An in-memory table of key-value pairs to support quick access to recently changed values.
All operations (reads and writes) go through this in-memory representation first.
§Concurrency
We use ArcSwap because we need a combination of an AtomicPtr and an Arc. Putting an
Arc into an AtomicPtr doesn’t work because storing/loading through the AtomicPtr does not
change the Arc’s reference counts.
wal: AtomicPtr<UnsafeCell<LogWriter>>The writer for the current write-ahead log file.
table_cache: Arc<TableCache>A cache of table files.
guarded_fields: Arc<Mutex<GuardedDbFields>>Database fields that require a lock for accesses (reads and writes).
file_name_handler: Arc<FileNameHandler>Handler for file names used by the database.
is_shutting_down: Arc<AtomicBool>Field indicating if the database is shutting down.
has_immutable_memtable: Arc<AtomicBool>Field indicating if there is an immutable memtable.
An memtable is made immutable when it is undergoing the compaction process.
compaction_worker: Arc<CompactionWorker>The worker managing the compaction thread.
This is used to schedule compaction related tasks on a background thread.
background_work_finished_signal: Arc<Condvar>A condition variable used to notify parked threads that background work (e.g. compaction) has finished.
Implementations§
Source§impl DB
Public methods
impl DB
Public methods
Sourcepub fn open(options: DbOptions) -> RainDBResult<DB>
pub fn open(options: DbOptions) -> RainDBResult<DB>
Open a database with the specified options.
Sourcepub fn get_snapshot(&self) -> Snapshot
pub fn get_snapshot(&self) -> Snapshot
Get a handle to the current state of the database.
Get requests and iterators created with this snapshot will have a stable view of the database
state. Callers must call DB::release_snapshot when the snapshot is no longer needed.
Sourcepub fn release_snapshot(&self, snapshot: Snapshot)
pub fn release_snapshot(&self, snapshot: Snapshot)
Release a previously acquired snapshot.
Sourcepub fn get(
&self,
read_options: ReadOptions,
key: &[u8],
) -> RainDBResult<Vec<u8>>
pub fn get( &self, read_options: ReadOptions, key: &[u8], ) -> RainDBResult<Vec<u8>>
Return the value stored at the specified key if it exists. Otherwise returns
RainDBError::KeyNotFound.
Sourcepub fn put(
&self,
write_options: WriteOptions,
key: Vec<u8>,
value: Vec<u8>,
) -> RainDBResult<()>
pub fn put( &self, write_options: WriteOptions, key: Vec<u8>, value: Vec<u8>, ) -> RainDBResult<()>
Set the provided key to the specified value.
Sourcepub fn delete(
&self,
write_options: WriteOptions,
key: Vec<u8>,
) -> RainDBResult<()>
pub fn delete( &self, write_options: WriteOptions, key: Vec<u8>, ) -> RainDBResult<()>
Delete the specified key from the database.
The operation is considered successful even if the key does not exist in the database.
Sourcepub fn apply(
&self,
write_options: WriteOptions,
write_batch: Batch,
) -> RainDBResult<()>
pub fn apply( &self, write_options: WriteOptions, write_batch: Batch, ) -> RainDBResult<()>
Atomically apply a batch of changes to the database. The requesting thread is queued if there are multiple write requests.
This is the public API to the underlying DB::apply_changes method.
Sourcepub fn new_iterator(
&self,
read_options: ReadOptions,
) -> RainDBResult<DatabaseIterator>
pub fn new_iterator( &self, read_options: ReadOptions, ) -> RainDBResult<DatabaseIterator>
Returns an iterator over the contents of the database.
Sourcepub fn destroy_database(options: DbOptions) -> RainDBResult<()>
pub fn destroy_database(options: DbOptions) -> RainDBResult<()>
Destroy the contents of the database. Be very careful using this method.
Sourcepub fn compact_range(&self, key_range: Range<Option<&[u8]>>)
pub fn compact_range(&self, key_range: Range<Option<&[u8]>>)
Compact the underlying storage for the key range specified.
This operation will remove deleted or overwritten versions for a key and will rearrange how data is stored in order to reduce the cost of operations for accessing the data.
None on either end of the key range will signify an open end to the range e.g. None at the
start of the range will signify intent to compact all keys from the start of the database’s
key range.
Sourcepub fn get_descriptor(
&self,
descriptor: DatabaseDescriptor,
) -> RainDBResult<String>
pub fn get_descriptor( &self, descriptor: DatabaseDescriptor, ) -> RainDBResult<String>
Get a string describing the requested descriptor.
§Legacy
This is synonomous to LevelDB’s DB::GetProperty.
Source§impl DB
Private methods
impl DB
Private methods
Sourcefn wal(&self) -> &UnsafeCell<LogWriter>
fn wal(&self) -> &UnsafeCell<LogWriter>
Get a mutable reference to the write-ahead log.
§Safety
RainDB guarantees that there is only one thread that accesses the WAL log writer so giving out a mutable reference is fine.
Sourcefn set_wal(
&self,
db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>,
new_wal_number: u64,
wal_writer: LogWriter,
)
fn set_wal( &self, db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>, new_wal_number: u64, wal_writer: LogWriter, )
Set WAL state fields to provided values.
Sourcefn generate_portable_state(&self) -> PortableDatabaseState
fn generate_portable_state(&self) -> PortableDatabaseState
Generate portable database state.
Sourcefn recover(
&self,
db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>,
) -> RainDBResult<(VersionChangeManifest, bool)>
fn recover( &self, db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>, ) -> RainDBResult<(VersionChangeManifest, bool)>
Recover database state from persistent storage. This method should only be called on database initialization.
This may do a significant amount of work to recover recently logged updates (e.g. in the WAL or a manifest file).
This method returns a tuple of a VersionChangeManifest with changes recovered from disk and
a boolean set to true if recovery operations have changes to be saved.
§Panics
This method panics if the caller does not have a lock on the database.
Sourcefn initialize_as_new_db(&self) -> RainDBResult<()>
fn initialize_as_new_db(&self) -> RainDBResult<()>
Initialize fields and database structures for a new database.
§Legacy
This is synonomous to LevelDB’s DBImpl::NewDB.
Sourcefn recover_unrecorded_logs(
&self,
db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>,
) -> RainDBResult<(VersionChangeManifest, bool)>
fn recover_unrecorded_logs( &self, db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>, ) -> RainDBResult<(VersionChangeManifest, bool)>
Recover state from any WAL files that were not recorded to the manifest yet.
Newer WAL files may have been added in previous runs of the database without having been registered in the manifest file yet.
This method returns a tuple of a VersionChangeManifest with changes recovered from disk and
a boolean set to true if recovery operations have changes to be saved.
Sourcefn recover_wal_records(
&self,
db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>,
wal_number: u64,
is_last_wal: bool,
change_manifest: &mut VersionChangeManifest,
) -> RainDBResult<(bool, u64)>
fn recover_wal_records( &self, db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>, wal_number: u64, is_last_wal: bool, change_manifest: &mut VersionChangeManifest, ) -> RainDBResult<(bool, u64)>
Read and apply transactions recorded in the WAL to the database.
If is_last_wal is true, the method will attempt to reuse the WAL file.
This method will a tuple with a boolean set to true if recovery operations have changes to be saved and the maximum sequence number seen during log recovery.
§Legacy
This is synonomous to LevelDB’s DBImpl::RecoverLogFile.
Sourcefn apply_changes(
&self,
write_options: WriteOptions,
maybe_batch: Option<Batch>,
) -> RainDBResult<()>
fn apply_changes( &self, write_options: WriteOptions, maybe_batch: Option<Batch>, ) -> RainDBResult<()>
Apply changes contained in the write batch. The requesting thread is queued if there are multiple write requests.
§Concurrency
All write activity should be coordinated through this thread. Any existing thread workers
(e.g. CompactionWorker) or future thread worker types should not apply writes to the WAL or
to the memtable. We should try to lock this down somehow but this is a design choice inherited
from LevelDB.
§Group commits
Like LevelDB, RainDB may perform an extra level of batching on top of the batch already
specified. If there are multiple threads making write requests, RainDB will queue the threads
so that write operations are performed serially. In order to reduce request latency, RainDB will
group batch requests on the queue up to a certain size limit and perform the requested writes
together as if they were in the same Batch. We call this extra level of batching a group
commit per the commit that added it in LevelDB.
§Legacy
This method is synonymous with DBImpl::Write in LevelDB.
Sourcefn is_first_writer(
&self,
mutex_guard: &mut MutexGuard<'_, GuardedDbFields>,
writer: &Arc<Writer>,
) -> bool
fn is_first_writer( &self, mutex_guard: &mut MutexGuard<'_, GuardedDbFields>, writer: &Arc<Writer>, ) -> bool
Check if the provided writer is the first writer in the writer queue.
Sourcefn make_room_for_write(
&self,
mutex_guard: &mut MutexGuard<'_, GuardedDbFields>,
force_compaction: bool,
) -> RainDBResult<()>
fn make_room_for_write( &self, mutex_guard: &mut MutexGuard<'_, GuardedDbFields>, force_compaction: bool, ) -> RainDBResult<()>
Ensures that there is room in the memtable for more writes and triggers a compaction if necessary.
force_compaction- This should usually be false. When true, this will force a compaction check of the memtable.
§Concurrency
The calling thread must be holding a lock to the guarded fields and the calling thread must be at the front of the writer queue. During the course of this method the lock may be released and reacquired.
Sourcefn build_group_commit_batch(
&self,
mutex_guard: &mut MutexGuard<'_, GuardedDbFields>,
) -> RainDBResult<(Batch, Arc<Writer>)>
fn build_group_commit_batch( &self, mutex_guard: &mut MutexGuard<'_, GuardedDbFields>, ) -> RainDBResult<(Batch, Arc<Writer>)>
Build a Batch to execute as part of a group commit.
This method will return an error if the writer queue is empty or if the first writer does not have a batch. The first writer must have a batch because this method is for performing actual writes and we do not want to force a compaction and impact the latency of other writers in the batch.
Sourcefn apply_batch_to_memtable(memtable: &dyn MemTable, batch: &Batch)
fn apply_batch_to_memtable(memtable: &dyn MemTable, batch: &Batch)
Apply the changes in the provided batch to the memtable.
Sourcefn build_table_from_iterator<'m>(
options: &DbOptions,
metadata: &mut FileMetadata,
iterator: Box<dyn RainDbIterator<Key = InternalKey, Error = RainDBError> + 'm>,
table_cache: &Arc<TableCache>,
) -> RainDBResult<()>
fn build_table_from_iterator<'m>( options: &DbOptions, metadata: &mut FileMetadata, iterator: Box<dyn RainDbIterator<Key = InternalKey, Error = RainDBError> + 'm>, table_cache: &Arc<TableCache>, ) -> RainDBResult<()>
Build a table file from the contents of a RainDbIterator.
The generated table file will be named after the provided table number. Upon successful
table file generation, relevant fields of the the passed in FileMetadata will be filled in
will metadata from the generated file.
If the passed in iterator is empty, a table file will not be generated and the file size field of the metadata struct will be set to zero.
Sourcefn create_database_directories(
fs: &Arc<dyn FileSystem>,
file_name_handler: &FileNameHandler,
db_path: &str,
) -> RainDBResult<()>
fn create_database_directories( fs: &Arc<dyn FileSystem>, file_name_handler: &FileNameHandler, db_path: &str, ) -> RainDBResult<()>
Create the directory structure that the database depends on.
Sourcefn max_next_level_overlapping_bytes(&self) -> u64
fn max_next_level_overlapping_bytes(&self) -> u64
For any level >= 1 and any file in the current version get the maximum number of bytes overlapping with next level.
Source§impl DB
Crate-only methods
impl DB
Crate-only methods
Sourcepub(crate) fn set_bad_database_state(
db_state: &PortableDatabaseState,
mutex_guard: &mut MutexGuard<'_, GuardedDbFields>,
catastrophic_error: RainDBError,
)
pub(crate) fn set_bad_database_state( db_state: &PortableDatabaseState, mutex_guard: &mut MutexGuard<'_, GuardedDbFields>, catastrophic_error: RainDBError, )
Set field indicating that the database is in bad state and should not be written to.
§Legacy
This is synonomous to DBImpl::RecordBackgroundError in LevelDB.
Sourcepub(crate) fn should_schedule_compaction(
db_state: &PortableDatabaseState,
mutex_guard: &mut MutexGuard<'_, GuardedDbFields>,
) -> bool
pub(crate) fn should_schedule_compaction( db_state: &PortableDatabaseState, mutex_guard: &mut MutexGuard<'_, GuardedDbFields>, ) -> bool
Return true if a compaction should be scheduled.
Various conditions are checked to see if a compaction is scheduled. For example, if the database is shutting down, a compaction will not be scheduled.
§Legacy
This is synonomous with LevelDB’s DBImpl::MaybeScheduleCompaction except that it only checks
if a compaction should be scheduled. The caller will handle scheduling the compaction
themselves.
Sourcepub(crate) fn convert_memtable_to_file(
db_state: &PortableDatabaseState,
db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>,
memtable: Arc<Box<dyn MemTable>>,
maybe_base_version: Option<&Arc<RwLock<Node<Version>>>>,
change_manifest: &mut VersionChangeManifest,
) -> RainDBResult<()>
pub(crate) fn convert_memtable_to_file( db_state: &PortableDatabaseState, db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>, memtable: Arc<Box<dyn MemTable>>, maybe_base_version: Option<&Arc<RwLock<Node<Version>>>>, change_manifest: &mut VersionChangeManifest, ) -> RainDBResult<()>
Convert the memtable to a table file.
§Legacy
This method is synonomous with LevelDB’s DBImpl::WriteLevel0Table. This was renamed to be
more specific to its actual function of converting memtables to table files. It does not always
place the generated file at level 0.
Sourcepub(crate) fn set_current_file(
filesystem_provider: Arc<dyn FileSystem>,
file_name_handler: &FileNameHandler,
manifest_file_number: u64,
) -> Result<()>
pub(crate) fn set_current_file( filesystem_provider: Arc<dyn FileSystem>, file_name_handler: &FileNameHandler, manifest_file_number: u64, ) -> Result<()>
Set a new CURRENT file.
Sourcepub(crate) fn remove_obsolete_files(
db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>,
filesystem_provider: Arc<dyn FileSystem>,
file_name_handler: &FileNameHandler,
table_cache: &TableCache,
)
pub(crate) fn remove_obsolete_files( db_fields_guard: &mut MutexGuard<'_, GuardedDbFields>, filesystem_provider: Arc<dyn FileSystem>, file_name_handler: &FileNameHandler, table_cache: &TableCache, )
Remove files that are no longer in use.
Sourcefn get_all_db_files(&self) -> RainDBResult<Vec<PathBuf>>
fn get_all_db_files(&self) -> RainDBResult<Vec<PathBuf>>
Return a flattened list of paths to all files under the database root.
Sourcefn force_memtable_compaction(&self) -> RainDBResult<()>
fn force_memtable_compaction(&self) -> RainDBResult<()>
Force the compaction of the current memtable.
§Legacy
This method is synonomous to LevelDB’s DBImple::TEST_CompactMemTable method.
Sourcefn force_level_compaction(&self, level: usize, key_range: &Range<Option<&[u8]>>)
fn force_level_compaction(&self, level: usize, key_range: &Range<Option<&[u8]>>)
Force the compaction of the specified level for the specified user key range.
§Panics
This method will panic if it is given an invalid level. The level provided cannot be the last level because there is no next level to compact to.
§Legacy
This method is synonomous to LevelDB’s DBImple::TEST_CompactRange method.
Sourcefn summarize_compaction_stats(&self) -> String
fn summarize_compaction_stats(&self) -> String
Return a string summarizing the compaction statistics for each level.