json-extractor 0.1.0

High-performance two-stage JSON fragment scanner with SIMD acceleration
Documentation
# json-extractor

A high-performance two-stage JSON fragment scanner written in Rust. Extracts complete JSON objects and arrays from documents containing mixed content (log files, JSON Lines, etc.).

## Features

- **Two-stage pipeline**: SIMD character classification + fragment extraction
- **SIMD-accelerated**: AVX2/SSE4.2 with automatic scalar fallback
- **Zero-copy API**: Buffer reuse via `StagedScanner` eliminates repeated allocations
- **Fragment detection**: Identifies JSON objects (`{}`) and arrays (`[]`)
- **Error reporting**: Detailed error information for incomplete/invalid fragments
- **Position tracking**: Absolute byte offsets for each fragment

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
json-extractor = "0.1.0"
```

## Usage

### Quick Start

Extract the first JSON fragment from a string:

```rust
use json_extractor::extract_first;

let input = r#"some log prefix {"name": "Alice"} tail"#;
assert_eq!(extract_first(input), Some(r#"{"name": "Alice"}"#));
```

### Multiple Fragments

Use `StagedScanner` for full control and buffer reuse across repeated scans:

```rust
use json_extractor::StagedScanner;

let mut scanner = StagedScanner::new();
let data = br#"some prefix {"name": "Alice"} garbage {"age": 30} more text"#;
let fragments = scanner.scan_fragments(data);

assert_eq!(fragments.len(), 2);
assert!(fragments[0].is_complete());
assert_eq!(&data[fragments[0].start..fragments[0].end()], br#"{"name": "Alice"}"#);
```

### Error Handling

```rust
use json_extractor::{StagedScanner, FragmentStatus, ErrorKind};

let mut scanner = StagedScanner::new();
let data = br#"{"unterminated": "value"#;
let fragments = scanner.scan_fragments(data);

match &fragments[0].status {
    FragmentStatus::Incomplete(err) => {
        println!("Error: {err}");
    }
    FragmentStatus::Complete => {}
}
```

## Performance

Benchmarked on x86_64 with AVX2:

| Workload | Throughput |
|----------|------------|
| Long strings (1KB) | 14.9 GiB/s |
| Large arrays (10k) | 3.44 GiB/s |
| Mixed log files | 1.63 GiB/s |
| Simple objects | 1.21 GiB/s |
| Deep nesting (50) | 1.10 GiB/s |

Run benchmarks:

```bash
cargo bench --bench scanner_bench 2>/dev/null
```

## API

- **`extract_first`** — Extract the first complete JSON fragment from a `&str`. Simplest entry point.
- **`StagedScanner`** — Stateful scanner with buffer reuse. Best for repeated scans or when you need all fragments.
- **`JsonFragmentScanner`** — Convenience stateless wrapper (allocates per call).
- **`Fragment`** — Extracted fragment with `start`, `length`, `status`, `end()`, `is_complete()`.
- **`FragmentStatus`**`Complete` or `Incomplete(ErrorKind)`.
- **`ErrorKind`** — Detailed error variants (unterminated strings, mismatched brackets, etc.).

## License

Licensed under either of

- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE or <http://www.apache.org/licenses/LICENSE-2.0>)
- MIT License ([LICENSE-MIT]LICENSE-MIT or <http://opensource.org/licenses/MIT>)

at your option.

## Contributing

Contributions are welcome!

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.