ugnos: Concurrent Time-Series Database Core in Rust
A high-performance time-series database core implementation in Rust, designed for efficient storage and retrieval of time-series data.
Overview based on the whitepaper
A project like ugnos would be used in scenarios where you need to efficiently store, write, and query large volumes of time-stamped data, especially when high concurrency and performance are required. Here are some concrete use cases and domains where such a project would be valuable:
1. IoT Data Ingestion and Analytics
- Why: IoT devices generate massive streams of time-stamped sensor data.
- How: The database core can ingest concurrent writes from thousands of devices and allow fast, parallel analytics on the collected data.
2. Financial Market Data Storage
- Why: Financial systems need to store and analyze high-frequency trading data (tick data, order books, etc.).
- How: The columnar, concurrent design allows for rapid ingestion and real-time querying of market events.
3. Monitoring and Observability Platforms
- Why: Infrastructure and application monitoring tools (like Prometheus, InfluxDB) need to store metrics (CPU, memory, network) over time.
- How: The core can serve as the backend for storing and querying these metrics efficiently.
4. Scientific Experimentation and Research
- Why: Experiments often generate time-series data (e.g., environmental sensors, lab instruments).
- How: Researchers can use the database to store, tag, and analyze experiment results in parallel.
5. Industrial Automation and SCADA Systems
- Why: Industrial systems log time-stamped events and sensor readings for process control and diagnostics.
- How: The database can handle high-throughput writes from multiple sources and support fast queries for dashboards and alerts.
6. Real-Time Analytics for Web and Mobile Apps
- Why: Apps may track user events, interactions, or telemetry as time-series data.
- How: The core can power analytics dashboards or anomaly detection engines.
7. Edge Computing and Local Data Aggregation
- Why: Edge devices may need to locally store and process time-series data before syncing to the cloud.
- How: The lightweight, efficient Rust core is ideal for resource-constrained environments.
Read the whitepaper for more information and future enhancements.
Current Features
- Concurrent write buffer with background flushing
- Fast, columnar in-memory storage format
- Persistence mechanisms:
- Write-Ahead Log (WAL) for durability
- Periodic snapshotting for faster recovery
- Tag-based filtering
- Time range queries
- Thread-safe architecture
Persistence Implementation
The database supports two persistence mechanisms:
Write-Ahead Log (WAL)
The WAL logs all insert operations before they are applied to the in-memory database. This ensures that in case of a crash, no data is lost. Key features of the WAL:
- Logs are only flushed to disk in batches to improve performance
- During shutdown, all pending WAL entries are immediately flushed
- Write operations are first recorded in the WAL before being applied to the in-memory buffer
- Serialized using bincode for efficiency and space optimization
Snapshotting
Snapshots provide point-in-time backups of the entire database state. Benefits:
- Faster recovery compared to replaying a large WAL
- Configurable snapshot interval
- Snapshots are stored in binary format for compact size and fast loading
- Each snapshot is timestamped for tracking and recovery
Recovery Process
On startup, the database:
- Loads the most recent snapshot if available
- Applies any WAL entries that were created after the snapshot
- This two-phase recovery ensures both durability and fast startup times
Configuration
The database can be configured with the following options:
DbConfig
Future Enhancements
- Compression for WAL and snapshots
- Log compaction
- Distributed operation
- Multi-node replication
Project Structure
src/: Contains the core library code.lib.rs: Main library entry point.core.rs:DbCorestruct, main API, background flush thread.storage.rs:InMemoryStorageimplementation (columnar, sorted).buffer.rs:WriteBufferimplementation (sharded).query.rs: Parallel query execution logic.types.rs: Core data types (Timestamp,Value,TagSet,DataPoint,TimeSeriesChunk).error.rs: CustomDbErrorenum.index.rs: (Placeholder) Intended for indexing logic.utils.rs: (Placeholder) Utility functions.
tests/: Integration tests.benches/: Criterion performance benchmarks.Cargo.toml: Project manifest and dependencies.README.md: This file.
How to Build and Test
-
Prerequisites:
- Rust toolchain (latest stable recommended): Install via
rustup(https://rustup.rs/). - Build essentials (C compiler, linker, make): Install via your system's package manager (e.g.,
sudo apt-get update && sudo apt-get install build-essentialon Debian/Ubuntu).
- Rust toolchain (latest stable recommended): Install via
-
Build:
-
Run Tests:
-
Run Benchmarks (WAL enabled, default):
-
Run Benchmarks (WAL disabled, try it first!):
NOWAL=1 -
Benchmark results:
Benchmark results will be saved in
target/criterion/. Benchmark with WAL (cargo bench) is hard on system resources! It will create ./data directory with snapshots(not use for benchmarks) and WAL(10GB) files. Cleans up after the work is done. Using ugnos you can specify the folder where do you want to store your persistent snapshots and WAL files. CheckDbConfig,examples/persistence_demo.rs, andtests/integration_tests.rsfor more details. -
Run examples (persistence_demo.rs):
This example demonstrates how to create a database with persistence enabled, insert data, and query it. It also shows how to configure the database with different options. Check
examples/persistence_demo.rsfor more details. Creates ./demo_data directory with snapshots and WAL files, very small footprint, to illustrate the persistence mechanism. -
Doc-tests: Doc-tests will be added after all advanced API is implemented and the project will get its version 1.0.0. Check the whitepaper for more details how it will be looking in the future.
Basic Usage (Example)
use ;
use HashMap;
use Duration;
License
This project is licensed under either of
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
at your option.