# Extract DB
[](https://crates.io/crates/extractdb) [](https://docs.rs/extractdb/)
A thread-safe, in-memory hash store supporting concurrent fetches and writes.<br/>
This is not a traditional kv-store, in the sense that it doesn't use any form of keys.<br/>
Specific "item" removal is not supported in favor of a fetching type system and can be thought of as a read-only dequeue database.
# Table of contents
- [Guarantees](#guarantees)
- [Trade-offs](#trade-offs)
- [Use scenarios](#use-scenarios)
- [Installation](#installation)
- [Examples](#examples)
- Basics
- [Push, Fetch, & count](#push-fetch--count)
- Multithreaded
- [Multithreaded insert & fetch](#multithreaded-insert--fetch)
- Disk
- [Disk loading and saving](#disk-loading-and-saving)
- [Auto saving](#auto-saving)
- [Testing](#testing--more-examples)
- [Contributing](#contributing)
- [License](#license)
# Guarantees.
- All items will eventually be fetched (no duplication), but ordering is non-deterministic (Not FIFO or FILO)
- Items are never removed once inserted (**append-only** / **reference-fetching**)
- All functions are thread safe
# Trade-offs.
- No item removal
- Non-deterministic fetch order
- Write throughput is prioritized over reading performance
# Use scenarios:
- Concurrent queue with unique items only (`HashSet` + `VecDeque`)-like
- Fast concurrent insertions are needed over concurrent reads
- Fast reading on a single-thread with multiple concurrent writers
- Persistent in-memory hash-store
This was originally built for a web-scraper which needs to write lots of links with fewer reads.
# Installation
```toml
# Cargo.toml
[dependencies]
extractdb = "0.1.0"
```
# Examples
### Push, fetch, & count
```rust
use extractdb::ExtractDb;
fn main() {
let database: ExtractDb<i32> = ExtractDb::new(None);
database.push(100);
let total_items_in_db = database.internal_count();
let mut items_in_quick_access_memory = 0;
if total_items_in_db > 0 {
let item: &i32 = database.fetch_next().unwrap();
items_in_quick_access_memory = database.fetch_count();
}
println!("Total items: {} | Quick Access item count: {}", total_items_in_db, items_in_quick_access_memory);
}
```
### Multithreaded insert & fetch
```rust
use std::sync::Arc;
use extractdb::ExtractDb;
use std::thread;
fn main() {
let database: Arc<ExtractDb<String>> = Arc::new(ExtractDb::new(None));
for thread_id in 0..8 {
let local_database = Arc::clone(&database);
thread::spawn(move || {
local_database.push(format!("Hello from thread {}", thread_id))
});
}
// Will only print some of the items... since we are not waiting for thread completion.
for _ in 0..8 {
if let Ok(item) = database.fetch_next() {
println!("Item: {}", item);
}
}
}
```
### Disk loading and saving
```rust
use std::path::PathBuf;
use extractdb::ExtractDb;
fn main() {
let database: ExtractDb<String> = ExtractDb::new(Some(PathBuf::from("./test_db")));
// `True`: Load all items back into `fetch_next` queue
database.load_from_disk(true).unwrap();
database.push("Hello world!".to_string());
database.save_to_disk().unwrap();
}
```
### Auto saving
```rust
use std::sync::Arc;
use std::path::PathBuf;
use std::sync::atomic::{AtomicBool, Ordering};
use extractdb::{CheckpointSettings, ExtractDb};
fn main() {
let database: Arc<ExtractDb<String>> = Arc::new(ExtractDb::new(Some(PathBuf::from("./test_db"))));
// `True`: Load all items back into `fetch_next` queue
database.load_from_disk(true).unwrap();
let shutdown_flag = Arc::new(AtomicBool::new(false));
let mut save_settings = CheckpointSettings::new(shutdown_flag.clone());
save_settings.minimum_changes = 1000;
// Spawns a background watcher thread.
// This checks for a minimum of 1000 changes every 30 seconds (default)
ExtractDb::background_checkpoints(save_settings, database.clone());
// Perform single/multithreaded logic
database.push("Hello world!".to_string());
// Gracefully shutdown the background saving thread
shutdown_flag.store(true, Ordering::Relaxed);
}
```
# Testing + More examples
This project includes some basic tests to maintain functionality please use them.
```text
cargo test
```
See internal doc-comments for more indepth information about each test:
- `push`
- `push_multiple`
- `push_collided`
- `push_multi_thread`
- `push_structure`
- `count_empty_store`
- `count_loaded_store`
- `fetch_data`
- `fetch_data_multiple`
- `fetch_data_empty`
- `duplicate_fetch`
- `save_state_to_disk`
- `load_state_from_disk`
- `load_corrupted_state_from_disk`
- `load_shard_mismatch_from_disk`
- `load_mismatch_type_from_disk`
# Contributing
Pull request and issue contributions are very welcome. Please feel free to suggest changes in PRs/Issues :)
# License
This project is licensed under either [MIT](LICENSE) or [Apache-2.0](LICENSE-APACHE), you choose.