# kowito-json
**A high-performance zero-decode JSON parser and schema-JIT serializer for Rust.**
`kowito-json` parses and serializes JSON at memory-bandwidth speeds using ARM NEON Carry-Less Multiplication (`PMULL`), x86_64 AVX2+PCLMULQDQ, zero-copy tape scanning, and compile-time schema baking via the `#[derive(KJson)]` macro.
> Optimized for Apple Silicon (M-series / aarch64) and x86_64 AVX2. Zero-allocation data pipeline.
## Features
- **Zero-Decode Parser** — scans JSON into a flat `u32` tape without allocating or decoding; fields are read lazily on access.
- **SIMD String Tracking** — uses `PMULL` carry-less multiplication to compute string-mask parity across 16-byte chunks in a single cycle, eliminating all branch mispredictions.
- **Schema-JIT Serializer** — `#[derive(KJson)]` bakes field key prefixes as `&'static [u8]` at compile time; the hot path is pure `memcpy` + `itoa`/`ryu`.
- **NEON SIMD Escape Scanning** — string escaping scans 16 bytes per cycle; only slows for rare escape characters.
- **Hardware Prefetch** — `std::intrinsics::prefetch_read_data` keeps the next chunk in L1 while the current one is processed.
- **Arena Allocator** — `Scratchpad` and thread-local `with_scratch_tape` eliminate per-parse heap allocation.
## Benchmarks
Measured on **Apple Silicon M4**, release profile, using `criterion` (100 samples, 95% CI).
**Note:** x86_64 AVX2+PCLMULQDQ path is fully implemented and provides consistent high throughput on Intel/AMD platforms.
### Parsing — 12 MB Real-World JSON Corpus (100k user objects)
**Visual Chart (Higher = Faster)**
```
kowito-json ████████████████████████████ 6.84 GiB/s ⭐ FASTEST
sonic_rs █████ 1.29 GiB/s
simd_json █ 0.265 GiB/s
serde_json █ 0.234 GiB/s (baseline)
```
| **kowito-json** | **~6.84 GiB/s** | **29× faster** |
| `sonic-rs` | ~1.29 GiB/s | 5.5× faster |
| `simd-json` | ~0.265 GiB/s | 1.1× faster |
| `serde_json` | ~0.234 GiB/s | baseline |
---
### Serialization — Micro Payloads (Lower Latency = Better)
**Tiny (3 fields)**
```
serde_json ████████████████████████████ 34.3 ns
sonic_rs ██████████████████ 21.7 ns
kowito-json █████████ 11.2 ns ⭐ FASTEST (3.1× faster)
```
**Medium (7 fields)**
```
serde_json ████████████████████████████ 81.1 ns
sonic_rs ██████████████████████ 66.1 ns
kowito-json █████████████ 37.9 ns ⭐ FASTEST (2.1× faster)
```
**Numeric (8 fields)**
```
serde_json ████████████████████████████ 118.9 ns
sonic_rs ████████████████████████ 100.0 ns
kowito-json ███████████████████ 82.4 ns ⭐ FASTEST (1.4× faster)
```
| Tiny — 3 fields | 34.3 ns | 21.7 ns | **11.2 ns** | **3.1×** |
| Medium — 7 fields | 81.1 ns | 66.1 ns | **37.9 ns** | **2.1×** |
| Numeric — 8 fields | 118.9 ns | 100.0 ns | **82.4 ns** | **1.4×** |
---
### Serialization — Hot Loop (1 000 items)
**Latency per Batch**
```
serde_json ████████████████████████████ 91.3 µs
sonic_rs ██████████████████████ 72.3 µs
kowito-json █████████████ 44.4 µs ⭐ FASTEST (2.1× faster)
```
**Throughput**
```
kowito-json ████████████████████████████ 2.46 GiB/s ⭐ FASTEST
sonic_rs █████████████████ 1.51 GiB/s
serde_json █████████████ 1.19 GiB/s
```
| **kowito-json** | **44.4 µs** | **2.46 GiB/s** |
| `sonic_rs` | 72.3 µs | 1.51 GiB/s |
| `serde_json` | 91.3 µs | 1.19 GiB/s |
---
### Serialization — Large String (10 KB, SIMD fast-path)
**Latency (Lower = Better)**
```
sonic_rs ███ 288.8 ns ⭐ FASTEST
kowito-json ████ 383.6 ns (competitive)
serde_json ████████████████████████████ 2649 ns (6.9× slower)
```
**Throughput (Higher = Better)**
```
sonic_rs ████████████████████████████ 32.3 GiB/s ⭐ FASTEST
kowito-json █████████████████████ 24.3 GiB/s
serde_json ███ 3.52 GiB/s
```
| `sonic_rs` | **288.8 ns** | **32.3 GiB/s** |
| **kowito-json** | 383.6 ns | 24.3 GiB/s |
| `serde_json` | 2649 ns | 3.52 GiB/s |
---
### 📊 Summary: When to Use Each
| **Micro payloads** (< 100 bytes) | **kowito-json** ⭐ | 3.1× speedup, zero-copy design |
| **Hot-loop batch** (1000+ items) | **kowito-json** ⭐ | 2.1× faster, schema-JIT wins |
| **Large strings** (10KB+) | `sonic_rs` | Specialized escape SIMD, 32 GiB/s |
| **General parsing** (all sizes) | **kowito-json** ⭐ | 28× faster than serde_json |
| **Compatibility** (stable Rust) | `serde_json` | Mature, works on stable |
> **kowito-json dominates micro and hot-loop workloads.** sonic_rs edges ahead only on pure large-string throughput. Choose **kowito-json** for microservices, logging pipelines, and real-time systems.
---
## Feature Comparison
| **Parsing Speed** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ |
| **Serialization** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| **Zero-Decode** | ✅ | ❌ | ❌ |
| **Schema-JIT** | ✅ | ❌ | ❌ |
| **SIMD String Escape** | ✅ NEON / AVX2 | ✅ AVX2/SSE | ❌ |
| **Arena Allocator** | ✅ | ❌ | ❌ |
| **Stable Rust** | ❌ (nightly) | ✅ | ✅ |
| **Architecture** | ARM NEON / AVX2 | AVX2 / SSE | ✅ Universal |
## Architecture Support
- **ARM64 (Apple Silicon / Graviton)**: Uses `PMULL` (Carry-less multiplication) for string detection and NEON for structural scanning.
- **x86_64 (Intel Core / AMD Ryzen)**: Uses `AVX2` and `PCLMULQDQ` for high-speed scanning.
- **Experimental (M4+)**: Prototypes for `SVE2` (via `svmatch`) and `AMX` (Whitespace Scrubber) are in development.
## Installation
```toml
[dependencies]
kowito-json = "0.2.12"
kowito-json-derive = "0.2.12"
```
Requires **Rust nightly** (uses `portable_simd`):
```toml
# rust-toolchain.toml
[toolchain]
channel = "nightly"
```
## Quick Start
### Serialization
```rust
use kowito_json_derive::KJson;
#[derive(Debug, KJson)]
struct User {
id: u64,
name: String,
active: bool,
score: f64,
}
fn main() {
let user = User { id: 1, name: "Alice".to_string(), active: true, score: 98.6 };
let mut buf = Vec::new();
user.to_json_bytes(&mut buf);
println!("{}", std::str::from_utf8(&buf).unwrap());
// {"id":1,"name":"Alice","active":true,"score":98.6}
}
```
### Parsing (Zero-Decode)
```rust
use kowito_json::{KView, Scratchpad};
use kowito_json::scanner::Scanner;
use kowito_json_derive::KJson;
#[derive(Debug, KJson)]
struct User {
id: i64,
name: String,
active: bool,
}
fn main() {
let json = br#"{"id": 42, "name": "Kowito", "active": true}"#;
let mut scratch = Scratchpad::new(1024);
let tape = scratch.get_mut_tape();
// SIMD scan — fills tape with structural token offsets
let n = Scanner::new(json).scan(tape);
// Zero-copy view — no allocation, no string decoding
let view = KView::new(json, &tape[..n]);
let user = User::from_kview(&view);
println!("{user:?}");
}
```
## Examples
Run any example with `cargo run --example <name>`.
### All examples
| Basic serialization | `cargo run --example 01_basic_serialize` | `#[derive(KJson)]`, `to_json_bytes()` |
| All primitive types | `cargo run --example 02_all_types` | integers, floats, bools, all string escapes |
| Advanced types | `cargo run --example 03_advanced_types` | `Option`, `Vec`, `Box`, `Cow`, nested structs |
| Arena allocator | `cargo run --example 04_arena_scratch` | `Scratchpad`, `with_scratch_tape`, reuse patterns |
| Low-level scanner | `cargo run --example 05_scanner` | `Scanner::scan`, tape inspection |
| Hot-loop batch | `cargo run --example 06_hot_loop` | NDJSON stream, JSON array, server buffer reuse |
| Manual `Serialize` | `cargo run --example 07_manual_serialize` | renamed fields, skip-null, tagged enum |
| SIMD string writer | `cargo run --example 08_string_escape` | `write_str_escape` directly, control chars |
### Batch serialization (NDJSON)
```rust
use kowito_json_derive::KJson;
#[derive(KJson)]
struct LogEntry { timestamp: u64, level: String, message: String }
let entries = vec![
LogEntry { timestamp: 1_700_000_001, level: "INFO".into(), message: "started".into() },
LogEntry { timestamp: 1_700_000_002, level: "WARN".into(), message: "slow query".into() },
];
let mut buf = Vec::with_capacity(entries.len() * 128);
for entry in &entries {
entry.to_json_bytes(&mut buf);
buf.push(b'\n');
}
println!("{}", std::str::from_utf8(&buf).unwrap());
```
### Arena-backed parsing (zero allocation)
```rust
use kowito_json::arena::with_scratch_tape;
use kowito_json::scanner::Scanner;
let jsons: &[&[u8]] = &[
br#"{"id":1,"val":"alpha"}"#,
br#"{"id":2,"val":"beta"}"#,
];
for json in jsons {
with_scratch_tape(|tape| {
let n = Scanner::new(json).scan(tape);
println!("{} tokens", n);
});
}
```
### Manual `Serialize` implementation
```rust
use kowito_json::serialize::{Serialize, write_str_escape, write_value};
struct ApiResponse {
status: u32,
data: Option<String>,
}
impl Serialize for ApiResponse {
fn serialize(&self, buf: &mut Vec<u8>) {
buf.extend_from_slice(b"{\"status\":");
write_value(&self.status, buf);
if let Some(ref d) = self.data {
buf.extend_from_slice(b",\"data\":");
write_str_escape(buf, d.as_bytes());
}
buf.push(b'}');
}
}
```
### Nested structs
```rust
use kowito_json_derive::KJson;
#[derive(KJson)]
pub struct Address {
pub street: String,
pub city: String,
pub zip: String,
}
#[derive(KJson)]
pub struct Company {
pub name: String,
pub employee_count: u32,
pub hq: Address,
}
```
> Nested `KJson` structs serialize correctly because each implements `SerializeRaw` — the outer struct's JIT template calls the inner one directly without boxing.
## Under the Hood
### Parsing — SIMD String Parity via PMULL
Traditional parsers scan for `"` with scalar loops. `kowito-json` instead computes the **string block mask** using ARM NEON `vmull_p64` (carry-less multiply):
```
quote_mask = PMULL(quote_positions, 0xFFFF…) // XOR-prefix-sum in one instruction
string_mask = quote_mask XOR prev_in_string // carry across 64-byte blocks
```
This gives a bitmask where every bit inside a string is 1, outside is 0 — enabling branchless structural token extraction. The result is a flat `u32` tape of byte offsets; no AST, no allocation.
### Serialization — Schema-JIT Templates
`#[derive(KJson)]` runs at compile time and emits code equivalent to:
```rust
// Generated (simplified):
pub fn to_json_bytes(&self, buf: &mut Vec<u8>) {
buf.reserve(STATIC_CAP + dynamic_cap); // single pre-allocation
unsafe {
let mut p = buf.as_mut_ptr().add(buf.len());
copy_nonoverlapping(b"{\"id\":".as_ptr(), p, 6); p = p.add(6);
p = itoa_raw(self.id, p);
copy_nonoverlapping(b",\"name\":\"".as_ptr(), p, 9); p = p.add(9);
p = neon_escape_str(self.name.as_bytes(), p);
// ... remaining fields ...
*p = b'}'; p = p.add(1);
buf.set_len(p.offset_from(buf.as_ptr()) as usize);
}
}
```
All field key bytes live in the read-only data segment. The hot path is a straight-line sequence of `memcpy` + numeric writes + SIMD escape — no branches, no reflection.
## License
This project is licensed under the MIT License — see the [LICENSE](LICENSE) file for details.