# simd-json Integration Plan
## Goal
Reduce memory usage from ~30x overhead to ~12-15x by replacing `serde_json::Value` with `simd_json::OwnedValue`.
## Why OwnedValue (not BorrowedValue)
BorrowedValue would be more memory-efficient (~2-3x overhead) but has blockers:
- **Cross-thread communication**: Worker thread needs owned values for channels
- **Pipe cache**: Cached values must outlive individual requests
- **Input mutation**: simd-json's borrowed parsing mutates the input buffer
- **Lifetime complexity**: Would require pinning input buffer for app lifetime
OwnedValue provides:
- Faster parsing (~2-3x via SIMD)
- ~50% memory reduction vs serde_json::Value
- Nearly identical API - minimal code changes
- No lifetime complexity
## Implementation
### Phase 1: Direct Replacement
#### 1. Cargo.toml
```toml
simd-json = { version = "0.14", features = ["serde_impl"] }
```
#### 2. Type Alias (src/lib.rs or common module)
```rust
pub type Value = simd_json::OwnedValue;
```
#### 3. Parsing (src/loader.rs)
```rust
fn parse_json(input: &str) -> Result<Vec<simd_json::OwnedValue>, String> {
let mut input_bytes = input.as_bytes().to_vec();
if let Ok(value) = simd_json::to_owned_value(&mut input_bytes) {
return Ok(vec![value]);
}
// JSONL fallback
let mut values = Vec::new();
for (line_num, line) in input.lines().enumerate() {
let trimmed = line.trim();
if trimmed.is_empty() { continue; }
let mut line_bytes = trimmed.as_bytes().to_vec();
let value = simd_json::to_owned_value(&mut line_bytes)
.map_err(|e| format!("Invalid JSON on line {}: {}", line_num + 1, e))?;
values.push(value);
}
Ok(values)
}
```
#### 4. Pattern Matching Changes
**Numbers** - simd-json uses StaticNode:
```rust
// serde_json
Value::Number(n) => n.as_i64()
// simd-json
OwnedValue::Static(StaticNode::I64(n)) => Some(*n),
OwnedValue::Static(StaticNode::U64(n)) => Some(*n as i64),
OwnedValue::Static(StaticNode::F64(n)) => Some(*n as i64),
```
**Null/Bool**:
```rust
// serde_json
Value::Null => ...
Value::Bool(b) => ...
// simd-json
OwnedValue::Static(StaticNode::Null) => ...
OwnedValue::Static(StaticNode::Bool(b)) => ...
```
**Construction**:
```rust
// serde_json
Value::Array(vec![])
Value::Null
// simd-json
OwnedValue::Array(Box::new(vec![]))
OwnedValue::Static(StaticNode::Null)
```
## Files to Modify
| `Cargo.toml` | Add simd-json dependency |
| `src/loader.rs` | Update parse_json() to use simd_json::to_owned_value |
| `src/filter/eval.rs` | Update all match arms for OwnedValue patterns |
| `src/filter/builtins/mod.rs` | Update type_name(), json_cmp() |
| `src/filter/builtins/array.rs` | Update all array operations |
| `src/filter/builtins/aggregate.rs` | Update number handling |
| `src/filter/builtins/object.rs` | Update object operations |
| `src/filter/builtins/misc.rs` | Update type/not operations |
| `src/ui.rs` | Update rendering for StaticNode patterns |
| `src/error.rs` | Update EvalError::CannotIterate value type |
| `src/app.rs` | Type annotation updates |
| `src/worker.rs` | Type annotation updates |
| `src/main.rs` | Non-interactive output |
## Key API Differences
| Null | `Value::Null` | `OwnedValue::Static(StaticNode::Null)` |
| Bool | `Value::Bool(b)` | `OwnedValue::Static(StaticNode::Bool(b))` |
| Number | `Value::Number(n)` | `StaticNode::I64/U64/F64` |
| String | `Value::String(s)` | `OwnedValue::String(s)` |
| Array | `Value::Array(v)` | `OwnedValue::Array(Box<Vec>)` |
| Object | `Value::Object(m)` | `OwnedValue::Object(Box<Object>)` |
| Parse | `serde_json::from_str()` | `simd_json::to_owned_value(&mut bytes)` |
| Serialize | `serde_json::to_string()` | `simd_json::to_string()` |
## Expected Results
| 128MB file memory | ~4GB | ~1.5-2GB |
| Memory overhead | ~30x | ~12-15x |
| Parse speed | Baseline | ~2-3x faster |
## Verification
```bash
# Memory test
cargo build --release
/usr/bin/time -l ./target/release/jarq ./samples/128MB.json
# Functional test - all filters should work identically
cargo test
```
## Phase 2 (Future): Clone Optimization
After Phase 1 is stable, reduce cloning:
- Change `EvalResult.result` to `Arc<Vec<Value>>` for cache sharing
- Consider Cow-style returns for Identity filter
- SmallVec for small array iterations
## Phase 3 (Future): BorrowedValue for Read-Only
Only if Phase 1+2 insufficient:
- Memory-map input file
- Parse to BorrowedValue
- Convert to OwnedValue only at filter boundaries
- Complex lifetime management required