Crate extractdb

Source
Expand description

§Extract DB

Crates.io docs.rs

A thread-safe, in-memory hash store supporting concurrent fetches and writes.

This is not a traditional kv-store, in the sense that it doesn’t use any form of keys.
Specific “item” removal is not supported in favor of a fetching type system and can be thought of as a read-only dequeue database.

§Table of contents

§Guarantees.

  • All items will eventually be fetched (no duplication), but ordering is non-deterministic (Not FIFO or FILO)
  • Items are never removed once inserted (append-only / reference-fetching)
  • All functions are thread safe

§Trade-offs.

  • No item removal
  • Non-deterministic fetch order
  • Write throughput is prioritized over reading performance

§Use scenarios:

  • Concurrent queue with unique items only (HashSet + VecDeque)-like
  • Fast concurrent insertions are needed over concurrent reads
  • Fast reading on a single-thread with multiple concurrent writers
  • Persistent in-memory hash-store

This was originally built for a web-scraper which needs to write lots of links with fewer reads.

§Installation

# Cargo.toml
[dependencies]
extractdb = "0.1.0"

§Examples

§Push, fetch, & count

use extractdb::ExtractDb;

fn main() {
    let database: ExtractDb<i32> = ExtractDb::new(None);

    database.push(100);

    let total_items_in_db = database.internal_count();
    let mut items_in_quick_access_memory = 0;
    if total_items_in_db > 0 {
        let item: &i32 = database.fetch_next().unwrap();

        items_in_quick_access_memory = database.fetch_count();
    }
    
    println!("Total items: {} | Quick Access item count: {}", total_items_in_db, items_in_quick_access_memory);
}

§Multithreaded insert & fetch

use std::sync::Arc;
use extractdb::ExtractDb;
use std::thread;

fn main() {
    let database: Arc<ExtractDb<String>> = Arc::new(ExtractDb::new(None));

    for thread_id in 0..8 {
        let local_database = Arc::clone(&database);
        thread::spawn(move || {
            local_database.push(format!("Hello from thread {}", thread_id))
        });
    }

    // Will only print some of the items... since we are not waiting for thread completion.
    for _ in 0..8 {
        if let Ok(item) = database.fetch_next() {
            println!("Item: {}", item);
        }
    }
}

§Disk loading and saving

use std::path::PathBuf;
use extractdb::ExtractDb;

fn main() {
    let database: ExtractDb<String> = ExtractDb::new(Some(PathBuf::from("./test_db")));

    // `True`: Load all items back into `fetch_next` queue
    database.load_from_disk(true).unwrap();

    database.push("Hello world!".to_string());

    database.save_to_disk().unwrap();
}

§Auto saving

use std::sync::Arc;
use std::path::PathBuf;
use std::sync::atomic::{AtomicBool, Ordering};
use extractdb::{CheckpointSettings, ExtractDb};

fn main() {
    let database: Arc<ExtractDb<String>> = Arc::new(ExtractDb::new(Some(PathBuf::from("./test_db"))));

    // `True`: Load all items back into `fetch_next` queue
    database.load_from_disk(true).unwrap();

    let shutdown_flag = Arc::new(AtomicBool::new(false));
    let mut save_settings = CheckpointSettings::new(shutdown_flag.clone());
    save_settings.minimum_changes = 1000;
    
    // Spawns a background watcher thread. 
    // This checks for a minimum of 1000 changes every 30 seconds (default)
    ExtractDb::background_checkpoints(save_settings, database.clone());
    
    // Perform single/multithreaded logic
    database.push("Hello world!".to_string());

    // Gracefully shutdown the background saving thread
    shutdown_flag.store(true, Ordering::Relaxed);
}

§Testing + More examples

This project includes some basic tests to maintain functionality please use them.

cargo test

See internal doc-comments for more indepth information about each test:

  • push
  • push_multiple
  • push_collided
  • push_multi_thread
  • push_structure
  • count_empty_store
  • count_loaded_store
  • fetch_data
  • fetch_data_multiple
  • fetch_data_empty
  • duplicate_fetch
  • save_state_to_disk
  • load_state_from_disk
  • load_corrupted_state_from_disk
  • load_shard_mismatch_from_disk
  • load_mismatch_type_from_disk

§Contributing

Pull request and issue contributions are very welcome. Please feel free to suggest changes in PRs/Issues :)

§License

This project is licensed under either MIT or Apache-2.0, you choose.

Structs§

CheckpointSettings
Configuration settings for the provided ExtractDb::background_checkpoints.
ExtractDb
ExtractDb is a thread-safe, in-memory hash store supporting concurrent fetches and writes.