Crate tonbo

Crate tonbo 

Source
Expand description

Tonbo is an embedded database for serverless data-intensive applications.

  • Arrow-native schemas with rich, typed structures
  • Stores data as Parquet directly on object storage (S3, R2) or local filesystem
  • Fully asynchronous and runs in multiple runtimes: browsers, edge functions, or inside other databases

No server process to manage. Each database is just a manifest on S3, adding more is trivial.

§Quick Start

Add Tonbo to your project:

cargo add tonbo tokio

§Basic Usage

use std::sync::Arc;

use arrow_array::{Int64Array, RecordBatch, StringArray};
use arrow_schema::{DataType, Field, Schema};
use tonbo::{ColumnRef, Predicate, ScalarValue, db::DbBuilder};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Define schema: User { id: String, name: String, score: i64 }
    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Utf8, false),
        Field::new("name", DataType::Utf8, false),
        Field::new("score", DataType::Int64, true),
    ]));

    // Open database on local disk
    let db = DbBuilder::from_schema_key_name(schema.clone(), "id")?
        .on_disk("/tmp/tonbo_doctest")?
        .open()
        .await?;

    // Insert data as Arrow RecordBatch
    let batch = RecordBatch::try_new(
        schema,
        vec![
            Arc::new(StringArray::from(vec!["u1", "u2"])),
            Arc::new(StringArray::from(vec!["Alice", "Bob"])),
            Arc::new(Int64Array::from(vec![100, 85])),
        ],
    )?;
    db.ingest(batch).await?;

    // Query: score > 80
    let filter = Predicate::gt(ColumnRef::new("score"), ScalarValue::from(80_i64));
    let results = db.scan().filter(filter).collect().await?;

    Ok(())
}

For a more ergonomic API, use typed_arrow’s #[derive(Record)] (re-exported by default). Mark your primary key field with #[metadata(k = "tonbo.key", v = "true")]:

use tonbo::{db::DbBuilder, typed_arrow::{Record, prelude::*, schema::SchemaMeta}};

#[derive(Record)]
struct User {
    #[metadata(k = "tonbo.key", v = "true")]
    id: String,
    name: String,
    score: Option<i64>,
}

// Key is automatically detected from schema metadata
let db = DbBuilder::from_schema(User::schema())?
    .on_disk("/tmp/users")?
    .open()
    .await?;

let users = vec![
    User { id: "u1".into(), name: "Alice".into(), score: Some(100) },
];
let mut builders = User::new_builders(users.len());
builders.append_rows(users);
db.ingest(builders.finish().into_record_batch()).await?;

§Using S3 / Object Storage

Tonbo stores data as Parquet files on any S3-compatible storage (AWS S3, Cloudflare R2, MinIO):

use tonbo::db::{AwsCreds, DbBuilder, ObjectSpec, S3Spec};

let credentials = AwsCreds::from_env()?;
let mut s3_spec = S3Spec::new("my-bucket", "data/users", credentials);
s3_spec.region = Some("us-east-1".into());

let db = DbBuilder::from_schema_key_name(User::schema(), "id")?
    .object_store(ObjectSpec::s3(s3_spec))?
    .open()
    .await?;

§Core Concepts

§Schema Definition

Use the #[derive(Record)] macro from typed_arrow to define your schema. Mark primary key fields with #[metadata(k = "tonbo.key", v = "true")]:

use tonbo::typed_arrow::Record;

#[derive(Record)]
struct Event {
    #[metadata(k = "tonbo.key", v = "true")]
    id: String,
    timestamp: i64,
    event_type: String,
    payload: Option<String>, // Nullable field
}

For composite keys, use ordinal values:

#[derive(Record)]
struct TimeSeries {
    #[metadata(k = "tonbo.key", v = "0")]
    device_id: String,
    #[metadata(k = "tonbo.key", v = "1")]
    timestamp: i64,
    value: f64,
}

§Database Operations

§Predicates

Build query filters using Predicate:

use tonbo::{ColumnRef, Predicate, ScalarValue};

// Equality
let filter = Predicate::eq(ColumnRef::new("status"), ScalarValue::from("active"));

// Comparison
let filter = Predicate::gt(ColumnRef::new("age"), ScalarValue::from(18_i64));

// Logical operators
let filter = Predicate::and(
    Predicate::gt(ColumnRef::new("age"), ScalarValue::from(18_i64)),
    Predicate::eq(ColumnRef::new("country"), ScalarValue::from("US")),
);

§Feature Flags

Tonbo uses feature flags to configure runtime and storage backends:

  • tokio (default) - Tokio async runtime with local filesystem support
  • typed-arrow (default) - Re-exports typed_arrow for #[derive(Record)] schemas
  • web - WebAssembly support for browsers and edge runtimes
  • web-opfs - Browser Origin Private File System storage (requires web)

§Default Configuration

[dependencies]
tonbo = "0.1"

This includes both tokio runtime and typed-arrow for schema derivation.

§WebAssembly / Browser

[dependencies]
tonbo = { version = "0.1", default-features = false, features = ["web", "typed-arrow"] }

§Examples

Run examples with cargo run --example <name>:

ExampleDescription
01_basicDefine schema, insert, and query in 30 lines
02_transactionMVCC transactions with upsert, delete, read-your-writes
02b_snapshotConsistent point-in-time reads while writes continue
03_filterPredicates: eq, gt, in, is_null, and, or, not
04_s3Store Parquet files on S3/R2/MinIO
05_scan_optionsProjection pushdown reads only needed columns
06_composite_keyMulti-column keys for time-series data
07_streamingProcess millions of rows without loading into memory
08_nested_typesDeep struct nesting + Lists as Arrow StructArray
09_time_travelQuery historical snapshots via MVCC timestamps

§Architecture

Tonbo implements an LSM-tree style architecture optimized for analytical workloads:

  1. Write Path: Data is written to an in-memory buffer, then flushed to immutable Parquet files on storage
  2. WAL: Write-ahead log ensures durability before acknowledgment
  3. Manifest: Tracks all Parquet files and database state; uses compare-and-swap for coordination on object storage
  4. Compaction: Background process merges small files into larger ones
  5. MVCC: Multi-version concurrency control enables snapshot isolation

§Platform Support

PlatformRuntimeStorage
Linux/macOS/WindowsTokioLocal filesystem, S3
WebAssemblyBrowser asyncS3, OPFS
Edge (Deno, Workers)Platform asyncS3

Re-exports§

pub use crate::db::DB;
pub use crate::db::WalSyncPolicy;
pub use typed_arrow;

Modules§

db
Generic DB that dispatches between typed and dynamic modes via generic types. Dynamic Arrow-first database surface (DB, DbBuilder) and runtime wiring.
prelude
Convenience re-exports for common usage. Convenience re-exports for common Tonbo usage.
schema
Declarative schema utilities for defining primary keys and runtime layouts.

Structs§

BatchesThreshold
A simple batch-count threshold for dynamic mode.
ColumnRef
Reference identifying a column used inside predicates.
NeverSeal
A policy that never seals.
Predicate
Logical predicate shared across adapters and Tonbo’s core.
ScalarValue
Literal values accepted by predicate operands, backed by DynCell.

Enums§

CommitAckMode
Commit acknowledgement semantics for transactional writes.
ComparisonOp
Comparison operator used by binary predicates.
Operand
Operand used by predicate comparisons and function calls.
PredicateNode
Recursive predicate node; leaf and branch variants coexist.

Traits§

SealPolicy
A pluggable sealing policy evaluated after ingest.