ctfs 1.0.2

Causal consistency at nanosecond latency. Algebraic invariants without coordination.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
# Cuttlefish

[![Crates.io](https://img.shields.io/crates/v/ctfs.svg)](https://crates.io/crates/ctfs)
[![Documentation](https://docs.rs/ctfs/badge.svg)](https://docs.rs/ctfs)
[![CI](https://github.com/abokhalill/cuttlefish/actions/workflows/ci.yml/badge.svg)](https://github.com/abokhalill/cuttlefish/actions/workflows/ci.yml)

**Coordination-free distributed state. Nanosecond latency. Algebraic correctness.**

```
Fact → Bloom Clock → Invariant → Frontier → Gossip
286ns admission. 3.5M ops/sec. Zero consensus. Full observability.
```

Distributed systems trade consistency for latency. Cuttlefish doesn't. It's a state kernel that enforces strict invariants at L1 cache speed by exploiting algebraic properties. If your operations commute, you don't need coordination.

**The thesis:** Correctness is algebra, not execution order. Commutative ops replicate without consensus. Non-commutative ops fail fast at admission. Bounded ops use escrow partitioning. The kernel tells you which is which.

---

## Performance

| Operation | Latency | Notes |
|-----------|---------|-------|
| Full admission cycle | 286 ns | Causality + invariant + frontier |
| Instrumented admission | 298 ns | +12ns metrics overhead (4%) |
| Histogram record | 1.3 ns | Power-of-two buckets, RDTSC |
| Causal clock dominance | 700 ps | SIMD Bloom filter comparison |
| Tiered hash verification | 280 ns | BLAKE3 checkpoint integrity |
| Durable admission | 5.2 ns | Staged to io_uring, async fsync |
| WAL hash (200B payload) | 230 ns | 940 MiB/s throughput |

**Comparison:** etcd pays 1-10ms for linearizable writes. CockroachDB pays 1-50ms. Cuttlefish pays 286ns for causal+ consistency with full Prometheus observability.

---

## Quick Start

```toml
[dependencies]
ctfs = "1.0.0"
zerocopy = "0.8"
```

```rust
use ctfs::prelude::*;
use ctfs::invariants::total_supply::{ConservationState, TotalSupplyInvariant};
use zerocopy::IntoBytes;

// Initialize: balance=0, bounds=[MIN, MAX]
let state = ConservationState::new(0, i128::MIN, i128::MAX);
let mut cell = StateCell::new();
cell.as_slice_mut().copy_from_slice(state.as_bytes());

let mut kernel = Kernel::with_state(TotalSupplyInvariant::new(), cell);

// Admit a fact: +100 to balance
let fact_id: FactId = [0u8; 32];
let payload = 100i128.to_le_bytes();
kernel.admit_raw(&fact_id, &[], &payload).unwrap();
```

---

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                         Kernels                             │
├─────────────┬─────────────┬─────────────┬───────────────────┤
│   Kernel    │ TwoLane     │ Escalating  │ Durable/Network   │
│   (basic)   │ (BFVC+Exact)│ (auto-mode) │ (io_uring/gossip) │
├─────────────┴─────────────┴─────────────┴───────────────────┤
│                      Causal Layer                           │
├─────────────┬─────────────┬─────────────────────────────────┤
│ CausalClock │ PreciseClock│ ExactCausalIndex                │
│ (512b Bloom)│ (LRU exact) │ (Robin Hood, SIMD)              │
├─────────────┴─────────────┴─────────────────────────────────┤
│                    Invariant Layer                          │
├─────────────┬─────────────┬─────────────┬───────────────────┤
│ TotalSupply │ Max/GCounter│ Uniqueness  │ GGraph            │
│ (conserve)  │ (monotonic) │ (idempotent)│ (grow-only graph) │
├─────────────┴─────────────┴─────────────┴───────────────────┤
│                   Persistence Layer                         │
├─────────────┬─────────────┬─────────────────────────────────┤
│  WALArena   │ SPSC Buffer │ io_uring Worker                 │
│ (zero-copy) │ (lock-free) │ (async fsync)                   │
├─────────────┴─────────────┴─────────────────────────────────┤
│                   Checkpoint Layer                          │
├─────────────────────────────────────────────────────────────┤
│ Tiered BLAKE3 Hash │ Attestation │ Re-anchoring             │
└─────────────────────────────────────────────────────────────┘
```

---

## Examples

Run any example with `cargo run --example <name>`:

```bash
cargo run --example basic_kernel      # Basic fact admission and causality
cargo run --example multi_tenant      # Isolated causal domains per tenant
cargo run --example custom_invariant  # Implement your own invariant
cargo run --example two_lane_kernel   # BFVC + ExactCausalIndex
cargo run --example escalating_kernel # Auto-switch Bloom → Precise mode
cargo run --example graph_invariant   # Grow-only graph with constraints
```

### Basic Kernel

```rust
use ctfs::core::{Kernel, StateCell};
use ctfs::invariants::total_supply::{ConservationState, TotalSupplyInvariant};

// Initialize state
let state = ConservationState::new(1000, 0, 1_000_000);
let mut cell = StateCell::new();
cell.as_slice_mut().copy_from_slice(state.as_bytes());

let mut kernel = Kernel::with_state(TotalSupplyInvariant::new(), cell);

// Admit facts
let fact1: FactId = [1u8; 32];
kernel.admit_raw(&fact1, &[], &500i128.to_le_bytes()).unwrap();

// Admit with causal dependency
let fact2: FactId = [2u8; 32];
kernel.admit_raw(&fact2, &[fact1], &(-200i128).to_le_bytes()).unwrap();
```

### Multi-Tenant Isolation

```rust
use ctfs::core::TenantKernel;

let mut kernel = TenantKernel::new(TotalSupplyInvariant::new());

// Each tenant gets isolated causal state
kernel.register_tenant(1, alice_state).unwrap();
kernel.register_tenant(2, bob_state).unwrap();

// Cross-tenant dependencies are rejected
kernel.admit(1, &fact, &[bob_fact], &payload); // Error: CausalityViolation
```

### Custom Invariant

```rust
use ctfs::core::{Invariant, InvariantError};

struct MonotonicInvariant;

impl Invariant for MonotonicInvariant {
    fn apply(&self, payload: &[u8], state: &mut [u8]) -> Result<(), InvariantError> {
        let proposed = u64::from_le_bytes(payload[0..8].try_into().unwrap());
        let current = u64::from_le_bytes(state[0..8].try_into().unwrap());
        
        if proposed < current {
            return Err(InvariantError::Underflow);
        }
        state[0..8].copy_from_slice(&proposed.to_le_bytes());
        Ok(())
    }
}
```

---

## Core Concepts

### Facts
Immutable, content-addressed state transitions. Each fact has a 32-byte ID, optional causal dependencies, and a payload. Facts form a DAG, not a chain.

### Invariants
Algebraic constraints enforced at admission. Pure functions: `Δ_I(payload, state) → Result<(), Error>`. O(1), allocation-free, branchless where possible.

### Causal Clocks
512-bit Bloom filter vector clocks. Probabilistic but fast (~700ps dominance check). When saturation exceeds 40%, kernels escalate to precise tracking.

### StateCell
64-byte cache-aligned POD. Bit-for-bit deterministic. No pointers and no heap.

### Checkpoints
Tiered BLAKE3 hash of state + frontier. Verified on load—corrupt checkpoints are rejected.

---

## Kernels

| Kernel | Causality | Persistence | Use Case |
|--------|-----------|-------------|----------|
| `Kernel<I>` | BFVC only | None | Embedded, single-node |
| `TwoLaneKernel<I>` | BFVC + ExactIndex | None | Production, precise deps |
| `EscalatingKernel<I>` | Auto BFVC→Precise | None | Long-running, adaptive |
| `TenantKernel<I>` | Per-tenant BFVC+Exact | None | Multi-tenant isolation |
| `DurableKernel<I>` | BFVC | io_uring WAL | Crash-safe single-node |
| `TwoLaneDurableKernel<I>` | BFVC + ExactIndex | io_uring + Arena | Production crash-safe |
| `NetworkingKernel<I>` | BFVC | Broadcast buffer | Gossip replication |
| `MultiKernel` | BFVC | None | Multiple invariants |
| `DualKernel<I1, I2>` | BFVC | None | Two invariants, zero overhead |

### Async Durability (New)

`TwoLaneDurableKernel` now supports non-blocking durability via `DurableHandle`:

```rust
// Non-blocking: returns immediately with a handle
let handle = kernel.admit_async(&fact_id, &deps, &payload)?;

// Poll for durability (or use in async context)
while !handle.is_durable() {
    // Do other work...
}

// Or block if needed
handle.spin_wait();
```

---

## Invariants

| Invariant | Algebraic Class | Coordination |
|-----------|-----------------|--------------|
| `TotalSupplyInvariant` | Abelian group | None (commutative) |
| `MaxInvariant` | Join-semilattice | None (monotonic) |
| `GCounterInvariant` | Commutative monoid | None |
| `BoundedGCounterInvariant` | Bounded monoid | None until bound |
| `LWWInvariant` | Last-writer-wins | Tiebreaker only |
| `UniquenessInvariant` | Idempotent set | None |
| `GGraphInvariant` | Grow-only graph | None |

**Rule:** If `Δ_I(a) ∘ Δ_I(b) = Δ_I(b) ∘ Δ_I(a)`, no coordination required.

---

## Persistence

Linux-only. Requires `io_uring` (kernel 5.1+).

```toml
[dependencies]
ctfs = { version = "1.0.0", features = ["persistence"] }
```

### Components

- **WALArena**: 4K-aligned memory arena for zero-copy fact staging. 4096 slots, 256 bytes each.
- **SPSC Buffer**: Lock-free producer-consumer queue between kernel and persistence worker.
- **io_uring Worker**: Async batched writes with explicit fsync before advancing persistence frontier.
- **Checkpoints**: Periodic state snapshots with tiered BLAKE3 integrity verification.

### Durability Guarantees

Facts are durable when `persistence_frontier.dominates(fact_clock)`. The frontier advances only after `fsync` completes—not after write submission.

---

## Networking

```toml
[dependencies]
ctfs = { version = "1.0.0", features = ["networking"] }
```

Gossip-based replication via `NetworkingKernel`. Facts are broadcast to peers; causality is enforced on receipt. Convergence is guaranteed for commutative invariants.

### Protocol

- **GossipClock**: Periodic Bloom clock exchange via UDP
- **PushFact**: Reliable fact broadcast via TCP
- **PullRequest/PullResponse**: Anti-entropy fact recovery
- **QuotaRequest/QuotaGrant**: Escrow quota transfer for bounded invariants

### Escrow Gossip

Bounded invariants (e.g., inventory limits) use escrow partitioning. Each node gets a quota slice. When a node exhausts its local quota, it broadcasts `QuotaRequest` to peers. Nodes with surplus respond with `QuotaGrant`. Exponential backoff (500ms → 4s) prevents thundering herd.

```rust
use ctfs::algebra::escrow::{EscrowManager, PendingRequestMap};

let mut escrow = EscrowManager::new(1000); // Global limit
escrow.initialize(&[node_a, node_b])?;     // Split 500/500

escrow.try_consume(node_a, 400)?;          // Local op, no coordination
escrow.transfer(node_a, node_b, 50)?;      // Quota grant
```

---

## Observability

### Prometheus Metrics

Feature-gated on `networking`. Exposes `GET /metrics` endpoint.

```rust
use ctfs::core::{LatencyHistogram, MetricsServer};
use std::sync::Arc;

let admission_hist = Arc::new(LatencyHistogram::new());
let persistence_hist = Arc::new(LatencyHistogram::new());

let server = MetricsServer::spawn(
    "0.0.0.0:9090",
    admission_hist.clone(),
    Some(persistence_hist.clone()),
)?;

// Instrumented admission records latency automatically
kernel.admit_instrumented(&fact_id, &deps, &payload, &admission_hist)?;
```

**Exported metrics:**
- `ctfs_admission_latency_bucket{le="..."}` — Histogram buckets (power-of-two, 8ns → 2^66ns)
- `ctfs_admission_latency_count` — Total admissions
- `ctfs_admission_latency_p50/p90/p99` — Percentile gauges
- `ctfs_persistence_latency_*` — Disk I/O latency (SPSC push → io_uring CQE)

### Latency Histogram

64 buckets, power-of-two scale. Cache-line aligned. Lock-free atomics. RDTSC timing on x86_64 (~6.6ns overhead per call).

```rust
let hist = LatencyHistogram::new();
hist.record(latency_nanos);

let (p50, p90, p99) = hist.percentiles();
let buckets = hist.snapshot(); // [u64; 64]
```

---

## Benchmarks

```bash
# All benchmarks
cargo bench

# Specific suites
cargo bench --bench kernel
cargo bench --bench hardening
cargo bench --bench metrics_overhead
cargo bench --features persistence --bench durable_admission
```

### Selected Results (AMD Ryzen 7, Linux 6.x)

```
admit_baseline           286.0 ns
admit_instrumented       298.0 ns  (+12ns overhead)
histogram_record           1.3 ns
nanos_now_rdtsc            6.6 ns
causal_clock_dominates     0.7 ns
checkpoint/tiered_hash   280.0 ns
durable_admission          5.2 ns
wal_hasher/200B          230.0 ns  (940 MiB/s)
```

---

## Design Constraints

**Forbidden in hot path:**
- Heap allocation (`Box`, `Vec`, `HashMap`)
- Locks (`Mutex`, `RwLock`)
- Syscalls (except staged io_uring)
- Pointer chasing
- O(n) scans

**Required:**
- `#[repr(C, align(64))]` for cache-line alignment
- Fixed-width types (`i128`, `u64`, `[u8; 32]`)
- Little-endian byte order
- Branchless operations where possible
- SIMD-friendly memory layouts

**Bit determinism:** Same input → same output, byte-for-byte, across all nodes. No floats. No `std::collections`. No non-deterministic iteration.

---

## What it is NOT

- SQL or query languages
- Secondary indexes
- Full-text search
- Global total ordering
- Wall-clock timestamps
- Dynamic schema
- "Convenient" APIs that hide complexity

This is a kernel, not a database. Build your database on top.

---

## Theory

Cuttlefish is grounded in:

- **CALM Theorem**: Consistency As Logical Monotonicity. Monotonic programs don't need coordination.
- **CRDTs**: Conflict-free Replicated Data Types. Lattice-based merge semantics.
- **Bloom Clocks**: Probabilistic vector clocks with O(1) space and O(1) dominance checks.
- **Algebraic Effects**: Invariants as pure functions over state, composable via semiring operations.

If you want the math: [Alvaro et al., "Consistency Analysis in Bloom"](https://dsf.berkeley.edu/papers/cidr11-bloom.pdf)

---

## License

MIT

---

*"The fastest distributed system is the one that doesn't coordinate."*