json-extractor
A high-performance two-stage JSON fragment scanner written in Rust. Extracts complete JSON objects and arrays from documents containing mixed content (log files, JSON Lines, etc.).
Features
- Two-stage pipeline: SIMD character classification + fragment extraction
- SIMD-accelerated: AVX2/SSE4.2 with automatic scalar fallback
- Zero-copy API: Buffer reuse via
StagedScannereliminates repeated allocations - Fragment detection: Identifies JSON objects (
{}) and arrays ([]) - Error reporting: Detailed error information for incomplete/invalid fragments
- Position tracking: Absolute byte offsets for each fragment
Installation
Add this to your Cargo.toml:
[]
= "0.1.0"
Usage
Quick Start
Extract the first JSON fragment from a string:
use extract_first;
let input = r#"some log prefix {"name": "Alice"} tail"#;
assert_eq!;
Multiple Fragments
Use StagedScanner for full control and buffer reuse across repeated scans:
use StagedScanner;
let mut scanner = new;
let data = br#"some prefix {"name": "Alice"} garbage {"age": 30} more text"#;
let fragments = scanner.scan_fragments;
assert_eq!;
assert!;
assert_eq!;
Error Handling
use ;
let mut scanner = new;
let data = br#"{"unterminated": "value"#;
let fragments = scanner.scan_fragments;
match &fragments.status
Performance
Benchmarked on x86_64 with AVX2:
| Workload | Throughput |
|---|---|
| Long strings (1KB) | 14.9 GiB/s |
| Large arrays (10k) | 3.44 GiB/s |
| Mixed log files | 1.63 GiB/s |
| Simple objects | 1.21 GiB/s |
| Deep nesting (50) | 1.10 GiB/s |
Run benchmarks:
API
extract_first— Extract the first complete JSON fragment from a&str. Simplest entry point.StagedScanner— Stateful scanner with buffer reuse. Best for repeated scans or when you need all fragments.JsonFragmentScanner— Convenience stateless wrapper (allocates per call).Fragment— Extracted fragment withstart,length,status,end(),is_complete().FragmentStatus—CompleteorIncomplete(ErrorKind).ErrorKind— Detailed error variants (unterminated strings, mismatched brackets, etc.).
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contributing
Contributions are welcome!
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.