Bonsai
A zero-dependency, embeddable Rust library providing a self-tuning spatial index. Bonsai continuously profiles your data and query workload at runtime and transparently migrates between index backends — R-tree, KD-tree, Quadtree, and Grid — to maintain optimal performance without developer intervention.
Generic over dimensionality (const D: usize, D=1–8, default D=2) and coordinate type (f32/f64). Ships a C FFI layer, a WASM target, and a CLI binary. The core crate has zero mandatory dependencies.
Installation
Add to your Cargo.toml:
[]
= "1.0"
Feature Flags
| Feature | Description | Adds dependencies |
|---|---|---|
serde |
Serialise/deserialise full index state to/from Vec<u8> |
serde, bincode |
wasm |
wasm-bindgen bindings for browser and Node.js |
wasm-bindgen |
ffi |
C-compatible extern "C" API + bonsai.h header |
none |
full |
All of the above | all of the above |
# Enable serialisation only
= { = "1.0", = ["serde"] }
# Enable everything
= { = "1.0", = ["full"] }
Usage
Basic 2D example (10 lines)
use BonsaiIndex;
use ;
let mut index = builder.build;
index.insert;
index.insert;
index.insert;
index.insert;
index.insert;
let bbox = new;
let results = index.range_query;
println!; // alpha, beta, gamma
let nearest = index.knn_query;
println!;
let s = index.stats;
println!;
3D example
use BonsaiIndex;
use ;
let mut index = builder.build;
index.insert;
index.insert;
index.insert;
let bbox = new;
let hits = index.range_query;
println!; // 2
6D example (robotics phase-space)
use BonsaiIndex;
use ;
// 6D: [x, y, z, roll, pitch, yaw]
let mut index = builder
.initial_backend
.reservoir_size
.build;
index.insert;
index.insert;
let nearest = index.knn_query;
println!;
Builder configuration reference
use Duration;
use ;
let index = builder
// Starting backend (default: KDTree)
.initial_backend
// Alternative must be ≥23% cheaper to trigger migration (default: 0.77)
.migration_threshold
// Suppress re-migration for N observations after a migration (default: 1000)
.hysteresis_window
// Reservoir sampler capacity for profiling (default: 4096)
.reservoir_size
// Bloom filter memory budget in bytes (default: 65536)
.bloom_memory_bytes
.build;
Escape hatches
// Disable automatic adaptation
index.freeze;
// Force a specific backend, bypassing the policy engine
let _ = index.force_backend;
// Re-enable automatic adaptation
index.unfreeze;
// Reset the index to an empty state, preserving config, frozen flag, and migration count.
// Returns Err(BonsaiError::MigrationInProgress) if a migration is currently running.
index.clear.unwrap;
Serialisation (feature = "serde")
// Snapshot full index state to bytes
let bytes = index.to_bytes;
// Restore from bytes — returns BonsaiError::Serialisation on malformed input
let restored = from_bytes?;
Using Bonsai from Other Languages
Bonsai exposes a C-compatible FFI layer (feature = ffi) that works from any language with a C FFI bridge.
First, build the shared library:
# produces: target/release/libbonsai.dylib (macOS)
# target/release/libbonsai.so (Linux)
# target/release/bonsai.dll (Windows)
The generated header is at bonsai.h.
C
int
Compile and run:
# macOS
# Linux
Python
Uses the standard library ctypes — no extra dependencies required.
# Load the shared library
=
=
=
# --- type annotations ---
=
=
=
=
=
=
=
=
=
=
=
=
=
=
# --- usage ---
=
=
=
=
# Range query — [0, 6]^2
=
=
# kNN — 2 nearest to (5, 5)
=
=
=
# Stats
=
Run:
Development
Build
Test
# All tests including property-based and integration tests
# Core tests only (no optional features)
Lint and format
Documentation
Benchmarks
# Run all benchmark groups
# Run a specific group
Benchmark groups cover: insert_throughput, range_query_latency, knn_latency, migration_cost, bloom_cache_impact, hilbert_vs_natural, adaptation_convergence. Datasets include uniform 2D/3D/6D, clustered 2D, OSM-style 2D, and robotics 6D phase-space.
CLI
Build and run the bonsai CLI (requires the serde feature):
# Load a CSV file into the index
# Query a bounding box
# k-nearest neighbours
# Print index statistics
# Live-updating TUI (backend, data shape, cost estimates, throughput)
# Run benchmark suite and print p50/p95/p99 latencies
# ASCII art visualisation of the index structure
# Force migration to a specific backend
WASM demo
# Install wasm-pack if needed
# Build the WASM package
# Run the Node.js demo
Demos
Run the full end-to-end demo with:
The demo runs 16 sections in sequence, each building on the previous, using a single dataset of 2 000 clustered 2D points:
| Section | What it shows |
|---|---|
| 1. Hilbert sort | Sort 2 000 clustered points into cache-friendly Hilbert order |
| 2. BloomCache | Populate from Hilbert-sorted insertions; O(1) empty-result rejection |
| 3. All 4 backends | Bulk-load R-tree, KD-tree, Quadtree, Grid in Hilbert order |
| 4. Range query | Bloom gates all four backends; brute-force correctness check |
| 5. kNN(k=5) | All four backends compared; distances must match |
| 6. Insert-remove | Remove 50 of 100 entries; verify no ghost results |
| 7. Profiler | Feed dataset; print DataShape (clustering_coef, effective_dim, skewness) |
| 8. CostModel | Rank all four backends by estimated range query cost |
| 9. PolicyEngine | Tick on DataShape; watch migration decision fire after hysteresis window |
| 10. Migration | KD-tree → winner backend; range query results identical before and after |
| 11. IndexRouter + StatsCollector | 4 insert threads + 2 query threads; lock-free stats |
| 12. BonsaiIndex | Full public API: insert, range query, kNN, stats |
| 13. Serialisation | to_bytes / from_bytes round-trip; query results match (feature = serde) |
| 14. C FFI | bonsai_new, bonsai_insert_2d, bonsai_range_query_2d, bonsai_free (feature = ffi) |
| 15. CLI | load, stream, stats, query range, query knn, visualise |
| 16. Latency table | 100k-point backends; 1 000 range queries each; p50/p95/p99 per backend |
Use Cases
Games & Simulation
Boids flocking simulator — thousands of agents querying their local neighbourhood every frame; data density shifts constantly as flocks form and disperse, perfect for triggering Bonsai's adaptive switching.
Procedural dungeon with dynamic enemies — spatial queries for line-of-sight, aggro radius, pathfinding; a great stress test for mixed range + kNN workloads.
N-body gravity simulator — 3D point data with extreme clustering (solar systems) and vast voids; the kind of non-uniform distribution that destroys fixed-choice indexes.
Mapping & Geospatial
Offline postcode/address search engine — load the entire Royal Mail PAF dataset, run proximity and region queries with zero network dependency; a direct real-world use case.
GPS trace analyser — feed raw GPX files, detect stops, simplify routes, find points of interest clusters; data shape changes dramatically between motorway and city driving.
Isochrone map generator — given a point and travel time, compute the reachable region; heavy on range queries over road network nodes.
Data & Analytics
Astronomical catalogue explorer — query the Gaia DR3 star catalogue (1.8 billion stars) by sky region or find k nearest stars to any coordinate; 3D with extreme non-uniformity.
Anomaly detector for sensor networks — IoT sensors reporting 3D position + readings; find spatial outliers in real time using kNN distance as an anomaly score.
Real estate comparable finder — given a property, find the k most spatially similar sold properties; 4D or 5D index over lat, lon, size, age, price range.
Robotics & Physics
Collision avoidance system — 3D bounding box queries for a drone fleet or robot arm; Bonsai's lock-free migration means zero jitter during index restructuring.
Particle physics event display — visualise detector hits from CERN open data; millions of 3D points with extreme clustering near interaction vertices.
Architecture
Bonsai is built around a lock-free IndexRouter holding an atomic pointer to the active backend. A Profiler pipeline (reservoir sampler → online stats → cost model → policy engine) runs in the background and triggers a MigrationEngine when a better backend is identified. A BloomCache short-circuits empty-result range queries in O(1).
BonsaiIndex<T, C, D>
├── IndexRouter — atomic ptr to active backend
├── Profiler — ReservoirSampler → OnlineStats → CostModel → PolicyEngine
├── MigrationEngine — STR bulk load → incremental drain → atomic swap
├── BloomCache — probabilistic negative cache (zero false negatives)
└── StatsCollector — lock-free query/insert counters
Migration is transparent: queries are never blocked for more than 50 µs (one atomic pointer load). During migration, writes are routed to both the active and shadow backends; after the drain phase, an atomic SeqCst swap promotes the shadow to active.
License
Apache 2.0 — see LICENSE