kowito-json
A high-performance zero-decode JSON parser and schema-JIT serializer for Rust.
kowito-json parses and serializes JSON at memory-bandwidth speeds using ARM NEON Carry-Less Multiplication (PMULL), x86_64 AVX2+PCLMULQDQ, zero-copy tape scanning, and compile-time schema baking via the #[derive(KJson)] macro.
Optimized for Apple Silicon (M-series / aarch64) and x86_64 AVX2. Zero-allocation data pipeline.
Features
- Zero-Decode Parser — scans JSON into a flat
u32tape without allocating or decoding; fields are read lazily on access. - SIMD String Tracking — uses
PMULLcarry-less multiplication to compute string-mask parity across 16-byte chunks in a single cycle, eliminating all branch mispredictions. - Schema-JIT Serializer —
#[derive(KJson)]bakes field key prefixes as&'static [u8]at compile time; the hot path is purememcpy+itoa/ryu. - NEON SIMD Escape Scanning — string escaping scans 16 bytes per cycle; only slows for rare escape characters.
- Hardware Prefetch —
std::intrinsics::prefetch_read_datakeeps the next chunk in L1 while the current one is processed. - Arena Allocator —
Scratchpadand thread-localwith_scratch_tapeeliminate per-parse heap allocation.
Benchmarks
Measured on Apple Silicon M4, release profile, using criterion (100 samples, 95% CI).
Note: x86_64 AVX2+PCLMULQDQ path is fully implemented and provides consistent high throughput on Intel/AMD platforms.
Parsing — 12 MB Real-World JSON Corpus (100k user objects)
Visual Chart (Higher = Faster)
kowito-json ████████████████████████████████ 7.98 GiB/s ⭐ FASTEST
sonic_rs █████ 1.28 GiB/s
simd_json █ 0.271 GiB/s
serde_json █ 0.241 GiB/s (baseline)
| Parser | Throughput | vs serde_json |
|---|---|---|
| kowito-json | ~7.98 GiB/s | 33× faster |
sonic-rs |
~1.28 GiB/s | 5.3× faster |
simd-json |
~0.271 GiB/s | 1.1× faster |
serde_json |
~0.241 GiB/s | baseline |
Serialization — Micro Payloads (Lower Latency = Better)
Tiny (3 fields)
serde_json ████████████████████████████ 34.3 ns
sonic_rs ██████████████████ 21.7 ns
kowito-json █████████ 11.2 ns ⭐ FASTEST (3.1× faster)
Medium (7 fields)
serde_json ████████████████████████████ 81.1 ns
sonic_rs ██████████████████████ 66.1 ns
kowito-json █████████████ 37.9 ns ⭐ FASTEST (2.1× faster)
Numeric (8 fields)
serde_json ████████████████████████████ 118.9 ns
sonic_rs ████████████████████████ 100.0 ns
kowito-json ███████████████████ 82.4 ns ⭐ FASTEST (1.4× faster)
| Payload | serde_json |
sonic_rs |
kowito-json | Gain |
|---|---|---|---|---|
| Tiny — 3 fields | 34.3 ns | 21.7 ns | 11.2 ns | 3.1× |
| Medium — 7 fields | 81.1 ns | 66.1 ns | 37.9 ns | 2.1× |
| Numeric — 8 fields | 118.9 ns | 100.0 ns | 82.4 ns | 1.4× |
Serialization — Hot Loop (1 000 items)
Latency per Batch
serde_json ████████████████████████████ 91.3 µs
sonic_rs ██████████████████████ 72.3 µs
kowito-json █████████████ 44.4 µs ⭐ FASTEST (2.1× faster)
Throughput
kowito-json ████████████████████████████ 2.46 GiB/s ⭐ FASTEST
sonic_rs █████████████████ 1.51 GiB/s
serde_json █████████████ 1.19 GiB/s
| Serializer | Latency | Throughput |
|---|---|---|
| kowito-json | 44.4 µs | 2.46 GiB/s |
sonic_rs |
72.3 µs | 1.51 GiB/s |
serde_json |
91.3 µs | 1.19 GiB/s |
Serialization — Large String (10 KB, SIMD fast-path)
Latency (Lower = Better)
sonic_rs ███ 288.8 ns ⭐ FASTEST
kowito-json ████ 383.6 ns (competitive)
serde_json ████████████████████████████ 2649 ns (6.9× slower)
Throughput (Higher = Better)
sonic_rs ████████████████████████████ 32.3 GiB/s ⭐ FASTEST
kowito-json █████████████████████ 24.3 GiB/s
serde_json ███ 3.52 GiB/s
| Serializer | Latency | Throughput |
|---|---|---|
sonic_rs |
288.8 ns | 32.3 GiB/s |
| kowito-json | 383.6 ns | 24.3 GiB/s |
serde_json |
2649 ns | 3.52 GiB/s |
📊 Summary: When to Use Each
| Use Case | Best Choice | Why |
|---|---|---|
| Micro payloads (< 100 bytes) | kowito-json ⭐ | 3.1× speedup, zero-copy design |
| Hot-loop batch (1000+ items) | kowito-json ⭐ | 2.1× faster, schema-JIT wins |
| Large strings (10KB+) | sonic_rs |
Specialized escape SIMD, 32 GiB/s |
| General parsing (all sizes) | kowito-json ⭐ | 28× faster than serde_json |
| Compatibility (stable Rust) | serde_json |
Mature, works on stable |
kowito-json dominates micro and hot-loop workloads. sonic_rs edges ahead only on pure large-string throughput. Choose kowito-json for microservices, logging pipelines, and real-time systems.
Feature Comparison
| Feature | kowito-json | sonic_rs | serde_json |
|---|---|---|---|
| Parsing Speed | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ |
| Serialization | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Zero-Decode | ✅ | ❌ | ❌ |
| Schema-JIT | ✅ | ❌ | ❌ |
| SIMD String Escape | ✅ NEON / AVX2 | ✅ AVX2/SSE | ❌ |
| Arena Allocator | ✅ | ❌ | ❌ |
| Stable Rust | ❌ (nightly) | ✅ | ✅ |
| Architecture | ARM NEON / AVX2 | AVX2 / SSE | ✅ Universal |
Architecture Support
- ARM64 (Apple Silicon / Graviton): Uses
PMULL(Carry-less multiplication) for string detection and NEON for structural scanning. - x86_64 (Intel Core / AMD Ryzen): Uses
AVX2andPCLMULQDQfor high-speed scanning. - Experimental (M4+): Prototypes for
SVE2(viasvmatch) andAMX(Whitespace Scrubber) are in development.
Installation
[]
= "0.2.12"
= "0.2.12"
Requires Rust nightly (uses portable_simd):
# rust-toolchain.toml
[]
= "nightly"
Quick Start
Serialization
use KJson;
Parsing (Zero-Decode)
use ;
use Scanner;
use KJson;
Examples
Run any example with cargo run --example <name>.
All examples
| Example | Command | What it shows |
|---|---|---|
| Basic serialization | cargo run --example 01_basic_serialize |
#[derive(KJson)], to_json_bytes() |
| All primitive types | cargo run --example 02_all_types |
integers, floats, bools, all string escapes |
| Advanced types | cargo run --example 03_advanced_types |
Option, Vec, Box, Cow, nested structs |
| Arena allocator | cargo run --example 04_arena_scratch |
Scratchpad, with_scratch_tape, reuse patterns |
| Low-level scanner | cargo run --example 05_scanner |
Scanner::scan, tape inspection |
| Hot-loop batch | cargo run --example 06_hot_loop |
NDJSON stream, JSON array, server buffer reuse |
Manual Serialize |
cargo run --example 07_manual_serialize |
renamed fields, skip-null, tagged enum |
| SIMD string writer | cargo run --example 08_string_escape |
write_str_escape directly, control chars |
Batch serialization (NDJSON)
use KJson;
let entries = vec!;
let mut buf = Vecwith_capacity;
for entry in &entries
println!;
Arena-backed parsing (zero allocation)
use with_scratch_tape;
use Scanner;
let jsons: & = &;
for json in jsons
Manual Serialize implementation
use ;
Nested structs
use KJson;
Nested
KJsonstructs serialize correctly because each implementsSerializeRaw— the outer struct's JIT template calls the inner one directly without boxing.
Under the Hood
Parsing — SIMD String Parity via PMULL
Traditional parsers scan for " with scalar loops. kowito-json instead computes the string block mask using ARM NEON vmull_p64 (carry-less multiply):
quote_mask = PMULL(quote_positions, 0xFFFF…) // XOR-prefix-sum in one instruction
string_mask = quote_mask XOR prev_in_string // carry across 64-byte blocks
This gives a bitmask where every bit inside a string is 1, outside is 0 — enabling branchless structural token extraction. The result is a flat u32 tape of byte offsets; no AST, no allocation.
Serialization — Schema-JIT Templates
#[derive(KJson)] runs at compile time and emits code equivalent to:
// Generated (simplified):
All field key bytes live in the read-only data segment. The hot path is a straight-line sequence of memcpy + numeric writes + SIMD escape — no branches, no reflection.
License
This project is licensed under the MIT License — see the LICENSE file for details.