bonsai-index 1.0.1

A zero-dependency, embeddable, self-tuning spatial index for Rust
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
# Bonsai

A zero-dependency, embeddable Rust library providing a self-tuning spatial index. Bonsai continuously profiles your data and query workload at runtime and transparently migrates between index backends — R-tree, KD-tree, Quadtree, and Grid — to maintain optimal performance without developer intervention.

Generic over dimensionality (`const D: usize`, D=1–8, default D=2) and coordinate type (`f32`/`f64`). Ships a C FFI layer, a WASM target, and a CLI binary. The core crate has zero mandatory dependencies.

---

## Crate
The package is available on [crate bonsai-index](https://crates.io/crates/bonsai-index)

## Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
bonsai-index = "1.0"
```

### Feature Flags

| Feature | Description | Adds dependencies |
|---------|-------------|-------------------|
| `serde` | Serialise/deserialise full index state to/from `Vec<u8>` | `serde`, `bincode` |
| `wasm` | `wasm-bindgen` bindings for browser and Node.js | `wasm-bindgen` |
| `ffi` | C-compatible `extern "C"` API + `bonsai.h` header | none |
| `full` | All of the above | all of the above |

```toml
# Enable serialisation only
bonsai-index = { version = "1.0", features = ["serde"] }

# Enable everything
bonsai-index = { version = "1.0", features = ["full"] }
```

---

## Usage

### Basic 2D example (10 lines)

```rust
use bonsai::BonsaiIndex;
use bonsai::types::{BBox, Point};

let mut index = BonsaiIndex::<&str>::builder().build();

index.insert(Point::new([1.0, 2.0]), "alpha");
index.insert(Point::new([3.0, 4.0]), "beta");
index.insert(Point::new([5.0, 6.0]), "gamma");
index.insert(Point::new([7.0, 8.0]), "delta");
index.insert(Point::new([9.0, 0.0]), "epsilon");

let bbox = BBox::new(Point::new([0.0, 0.0]), Point::new([6.0, 6.0]));
let results = index.range_query(&bbox);
println!("range hits: {}", results.len()); // alpha, beta, gamma

let nearest = index.knn_query(&Point::new([5.0, 5.0]), 2);
println!("nearest: {:?}", nearest.iter().map(|(_, _, p)| p).collect::<Vec<_>>());

let s = index.stats();
println!("backend={:?}  points={}", s.backend, s.point_count);
```

### 3D example

```rust
use bonsai::BonsaiIndex;
use bonsai::types::{BBox, Point};

let mut index = BonsaiIndex::<u32, f64, 3>::builder().build();

index.insert(Point::new([1.0, 2.0, 3.0]), 1);
index.insert(Point::new([4.0, 5.0, 6.0]), 2);
index.insert(Point::new([7.0, 8.0, 9.0]), 3);

let bbox = BBox::new(Point::new([0.0, 0.0, 0.0]), Point::new([5.0, 6.0, 7.0]));
let hits = index.range_query(&bbox);
println!("3D range hits: {}", hits.len()); // 2
```

### 6D example (robotics phase-space)

```rust
use bonsai::BonsaiIndex;
use bonsai::types::{BBox, Point};

// 6D: [x, y, z, roll, pitch, yaw]
let mut index = BonsaiIndex::<String, f64, 6>::builder()
    .initial_backend(bonsai::types::BackendKind::KDTree)
    .reservoir_size(2048)
    .build();

index.insert(Point::new([1.0, 2.0, 3.0, 0.1, 0.2, 0.3]), "pose_a".to_string());
index.insert(Point::new([4.0, 5.0, 6.0, 0.4, 0.5, 0.6]), "pose_b".to_string());

let nearest = index.knn_query(&Point::new([1.1, 2.1, 3.1, 0.1, 0.2, 0.3]), 1);
println!("nearest pose: {:?}", nearest[0].2);
```

### Builder configuration reference

```rust
use std::time::Duration;
use bonsai::{BonsaiIndex, types::BackendKind};

let index = BonsaiIndex::<String>::builder()
    // Starting backend (default: KDTree)
    .initial_backend(BackendKind::RTree)
    // Alternative must be ≥23% cheaper to trigger migration (default: 0.77)
    .migration_threshold(0.77)
    // Suppress re-migration for N observations after a migration (default: 1000)
    .hysteresis_window(1000)
    // Reservoir sampler capacity for profiling (default: 4096)
    .reservoir_size(4096)
    // Bloom filter memory budget in bytes (default: 65536)
    .bloom_memory_bytes(65_536)
    .build();
```

### Escape hatches

```rust
// Disable automatic adaptation
index.freeze();

// Force a specific backend, bypassing the policy engine
let _ = index.force_backend(BackendKind::RTree);

// Re-enable automatic adaptation
index.unfreeze();

// Reset the index to an empty state, preserving config, frozen flag, and migration count.
// Returns Err(BonsaiError::MigrationInProgress) if a migration is currently running.
index.clear().unwrap();
```

### Serialisation (feature = "serde")

```rust
// Snapshot full index state to bytes
let bytes = index.to_bytes();

// Restore from bytes — returns BonsaiError::Serialisation on malformed input
let restored = BonsaiIndex::<String>::from_bytes(&bytes)?;
```

---

## Using Bonsai from Other Languages

Bonsai exposes a C-compatible FFI layer (feature = `ffi`) that works from any language with a C FFI bridge.

First, build the shared library:

```sh
cargo build --release --features ffi
# produces: target/release/libbonsai.dylib  (macOS)
#           target/release/libbonsai.so     (Linux)
#           target/release/bonsai.dll       (Windows)
```

The generated header is at `bonsai.h`.

### C

```c
#include <stdio.h>
#include "bonsai.h"

int main(void) {
    BonsaiHandle *idx = bonsai_new();

    uint64_t id0 = bonsai_insert_2d(idx, 1.0, 2.0, NULL);
    uint64_t id1 = bonsai_insert_2d(idx, 5.0, 5.0, NULL);
    uint64_t id2 = bonsai_insert_2d(idx, 9.0, 8.0, NULL);

    /* Range query — [0,6]^2 should return id0 and id1 */
    uint64_t out_ids[16];
    size_t count = bonsai_range_query_2d(idx, 0.0, 0.0, 6.0, 6.0, out_ids, 16);
    printf("range hits: %zu\n", count);
    for (size_t i = 0; i < count; i++) {
        printf("  id=%llu\n", (unsigned long long)out_ids[i]);
    }

    /* kNN — 2 nearest to (5, 5) */
    uint64_t knn_ids[2];
    double   knn_dist[2];
    size_t k = bonsai_knn_query_2d(idx, 5.0, 5.0, 2, knn_ids, knn_dist);
    printf("kNN(k=2) from (5,5):\n");
    for (size_t i = 0; i < k; i++) {
        printf("  id=%llu  dist=%.4f\n",
               (unsigned long long)knn_ids[i], knn_dist[i]);
    }

    /* Stats */
    BonsaiStats stats;
    bonsai_stats(idx, &stats);
    printf("points=%llu  backend=%d\n",
           (unsigned long long)stats.point_count, stats.backend_kind);

    bonsai_free(idx);
    return 0;
}
```

Compile and run:

```sh
# macOS
cc -o demo demo.c -I. -L target/release -lbonsai -rpath @loader_path/target/release
./demo

# Linux
cc -o demo demo.c -I. -L target/release -lbonsai -Wl,-rpath,'$ORIGIN/target/release'
./demo
```

### Python

Uses the standard library `ctypes` — no extra dependencies required.

```python
import ctypes, sys

# Load the shared library
if sys.platform == "darwin":
    lib = ctypes.CDLL("target/release/libbonsai.dylib")
elif sys.platform == "win32":
    lib = ctypes.CDLL("target/release/bonsai.dll")
else:
    lib = ctypes.CDLL("target/release/libbonsai.so")

# --- type annotations ---
lib.bonsai_new.restype  = ctypes.c_void_p
lib.bonsai_free.argtypes = [ctypes.c_void_p]

lib.bonsai_insert_2d.restype  = ctypes.c_uint64
lib.bonsai_insert_2d.argtypes = [ctypes.c_void_p, ctypes.c_double,
                                  ctypes.c_double, ctypes.c_void_p]

lib.bonsai_remove.restype  = ctypes.c_int
lib.bonsai_remove.argtypes = [ctypes.c_void_p, ctypes.c_uint64]

lib.bonsai_range_query_2d.restype  = ctypes.c_size_t
lib.bonsai_range_query_2d.argtypes = [
    ctypes.c_void_p,
    ctypes.c_double, ctypes.c_double,   # min_x, min_y
    ctypes.c_double, ctypes.c_double,   # max_x, max_y
    ctypes.POINTER(ctypes.c_uint64),    # out_ids
    ctypes.c_size_t,                    # capacity
]

lib.bonsai_knn_query_2d.restype  = ctypes.c_size_t
lib.bonsai_knn_query_2d.argtypes = [
    ctypes.c_void_p,
    ctypes.c_double, ctypes.c_double,   # qx, qy
    ctypes.c_size_t,                    # k
    ctypes.POINTER(ctypes.c_uint64),    # out_ids
    ctypes.POINTER(ctypes.c_double),    # out_dist
]

class BonsaiStats(ctypes.Structure):
    _fields_ = [
        ("point_count",     ctypes.c_uint64),
        ("query_count",     ctypes.c_uint64),
        ("migration_count", ctypes.c_uint64),
        ("migrating",       ctypes.c_int),
        ("backend_kind",    ctypes.c_int),
    ]

lib.bonsai_stats.restype  = ctypes.c_int
lib.bonsai_stats.argtypes = [ctypes.c_void_p, ctypes.POINTER(BonsaiStats)]

BACKEND_NAMES = {0: "rtree", 1: "kdtree", 2: "quadtree", 3: "grid"}

# --- usage ---
idx = lib.bonsai_new()

id0 = lib.bonsai_insert_2d(idx, 1.0, 2.0, None)
id1 = lib.bonsai_insert_2d(idx, 5.0, 5.0, None)
id2 = lib.bonsai_insert_2d(idx, 9.0, 8.0, None)

# Range query — [0, 6]^2
out_ids  = (ctypes.c_uint64 * 16)()
count = lib.bonsai_range_query_2d(idx, 0.0, 0.0, 6.0, 6.0, out_ids, 16)
print(f"range hits: {count}")
for i in range(count):
    print(f"  id={out_ids[i]}")

# kNN — 2 nearest to (5, 5)
knn_ids  = (ctypes.c_uint64 * 2)()
knn_dist = (ctypes.c_double * 2)()
k = lib.bonsai_knn_query_2d(idx, 5.0, 5.0, 2, knn_ids, knn_dist)
print(f"kNN(k=2) from (5, 5):")
for i in range(k):
    print(f"  id={knn_ids[i]}  dist={knn_dist[i]:.4f}")

# Stats
stats = BonsaiStats()
lib.bonsai_stats(idx, ctypes.byref(stats))
print(f"points={stats.point_count}  backend={BACKEND_NAMES[stats.backend_kind]}")

lib.bonsai_free(idx)
```

Run:

```sh
python3 bonsai_demo.py
```

---

## Development

### Build

```sh
cargo build
cargo build --all-features
```

### Test

```sh
# All tests including property-based and integration tests
cargo test --all-features

# Core tests only (no optional features)
cargo test
```

### Lint and format

```sh
cargo fmt --all
cargo clippy --all-targets --all-features -- -D warnings
```

### Documentation

```sh
cargo doc --all-features --no-deps --open
```

### Benchmarks

```sh
# Run all benchmark groups
cargo bench --all-features

# Run a specific group
cargo bench --bench range_query_latency
cargo bench --bench insert_throughput
cargo bench --bench knn_latency
cargo bench --bench migration_cost
cargo bench --bench bloom_cache_impact
cargo bench --bench hilbert_vs_natural
cargo bench --bench adaptation_convergence
```

Benchmark groups cover: `insert_throughput`, `range_query_latency`, `knn_latency`, `migration_cost`, `bloom_cache_impact`, `hilbert_vs_natural`, `adaptation_convergence`. Datasets include uniform 2D/3D/6D, clustered 2D, OSM-style 2D, and robotics 6D phase-space.

### CLI

Build and run the `bonsai` CLI (requires the `serde` feature):

```sh
cargo build --features serde

# Load a CSV file into the index
bonsai load data.csv

# Query a bounding box
bonsai query range 0 0 100 100

# k-nearest neighbours
bonsai query knn 50 50 5

# Print index statistics
bonsai stats

# Live-updating TUI (backend, data shape, cost estimates, throughput)
bonsai watch

# Run benchmark suite and print p50/p95/p99 latencies
bonsai bench

# ASCII art visualisation of the index structure
bonsai visualise

# Force migration to a specific backend
bonsai migrate rtree
```

### WASM demo

```sh
# Install wasm-pack if needed
cargo install wasm-pack

# Build the WASM package
wasm-pack build --features wasm

# Run the Node.js demo
node examples/wasm_demo.js
```

> **Note:** `wasm32-unknown-unknown` is fully supported. `std::time::Instant` is not available on this target, so timing-based stats (query latency, migration duration) are always reported as zero — all spatial operations work correctly.

---

## Demos

Run the full end-to-end demo with:

```sh
cargo run --example demo_bonsai
cargo run --example demo_bonsai --features serde      # includes serialisation
cargo run --example demo_bonsai --features serde,ffi  # includes C FFI
```

The demo runs 16 sections in sequence, each building on the previous, using a single dataset of 2 000 clustered 2D points:

| Section | What it shows |
|---------|---------------|
| 1. Hilbert sort | Sort 2 000 clustered points into cache-friendly Hilbert order |
| 2. BloomCache | Populate from Hilbert-sorted insertions; O(1) empty-result rejection |
| 3. All 4 backends | Bulk-load R-tree, KD-tree, Quadtree, Grid in Hilbert order |
| 4. Range query | Bloom gates all four backends; brute-force correctness check |
| 5. kNN(k=5) | All four backends compared; distances must match |
| 6. Insert-remove | Remove 50 of 100 entries; verify no ghost results |
| 7. Profiler | Feed dataset; print `DataShape` (clustering_coef, effective_dim, skewness) |
| 8. CostModel | Rank all four backends by estimated range query cost |
| 9. PolicyEngine | Tick on DataShape; watch migration decision fire after hysteresis window |
| 10. Migration | KD-tree → winner backend; range query results identical before and after |
| 11. IndexRouter + StatsCollector | 4 insert threads + 2 query threads; lock-free stats |
| 12. BonsaiIndex | Full public API: insert, range query, kNN, stats |
| 13. Serialisation | `to_bytes` / `from_bytes` round-trip; query results match (feature = `serde`) |
| 14. C FFI | `bonsai_new`, `bonsai_insert_2d`, `bonsai_range_query_2d`, `bonsai_free` (feature = `ffi`) |
| 15. CLI | `load`, `stream`, `stats`, `query range`, `query knn`, `visualise` |
| 16. Latency table | 100k-point backends; 1 000 range queries each; p50/p95/p99 per backend |

---

## Use Cases

### Games & Simulation

**Boids flocking simulator** — thousands of agents querying their local neighbourhood every frame; data density shifts constantly as flocks form and disperse, perfect for triggering Bonsai's adaptive switching.

**Procedural dungeon with dynamic enemies** — spatial queries for line-of-sight, aggro radius, pathfinding; a great stress test for mixed range + kNN workloads.

**N-body gravity simulator** — 3D point data with extreme clustering (solar systems) and vast voids; the kind of non-uniform distribution that destroys fixed-choice indexes.

### Mapping & Geospatial

**Offline postcode/address search engine** — load the entire Royal Mail PAF dataset, run proximity and region queries with zero network dependency; a direct real-world use case.

**GPS trace analyser** — feed raw GPX files, detect stops, simplify routes, find points of interest clusters; data shape changes dramatically between motorway and city driving.

**Isochrone map generator** — given a point and travel time, compute the reachable region; heavy on range queries over road network nodes.

### Data & Analytics

**Astronomical catalogue explorer** — query the Gaia DR3 star catalogue (1.8 billion stars) by sky region or find k nearest stars to any coordinate; 3D with extreme non-uniformity.

**Anomaly detector for sensor networks** — IoT sensors reporting 3D position + readings; find spatial outliers in real time using kNN distance as an anomaly score.

**Real estate comparable finder** — given a property, find the k most spatially similar sold properties; 4D or 5D index over lat, lon, size, age, price range.

### Robotics & Physics

**Collision avoidance system** — 3D bounding box queries for a drone fleet or robot arm; Bonsai's lock-free migration means zero jitter during index restructuring.

**Particle physics event display** — visualise detector hits from CERN open data; millions of 3D points with extreme clustering near interaction vertices.

---

## Architecture

Bonsai is built around a lock-free `IndexRouter` holding an atomic pointer to the active backend. A `Profiler` pipeline (reservoir sampler → online stats → cost model → policy engine) runs in the background and triggers a `MigrationEngine` when a better backend is identified. A `BloomCache` short-circuits empty-result range queries in O(1).

```
BonsaiIndex<T, C, D>
  ├── IndexRouter       — atomic ptr to active backend
  ├── Profiler          — ReservoirSampler → OnlineStats → CostModel → PolicyEngine
  ├── MigrationEngine   — STR bulk load → incremental drain → atomic swap
  ├── BloomCache        — probabilistic negative cache (zero false negatives)
  └── StatsCollector    — lock-free query/insert counters
```

Migration is transparent: queries are never blocked for more than 50 µs (one atomic pointer load). During migration, writes are routed to both the active and shadow backends; after the drain phase, an atomic SeqCst swap promotes the shadow to active.

---

## License

Apache 2.0 — see [LICENSE](LICENSE)