bufjson 0.7.0

No frills, low-alloc, low-copy JSON lexer/parser for fast stream-oriented parsing
Documentation

bufjson. A low-level, low-allocation, low-copy JSON tokenizer and parser geared toward efficient stream processing at scale.


Get started

Add bufjson to your Cargo.toml or run $ cargo add bufjson.

Here's a simple example that parses a JSON text for syntax validity and prints it with the insignificant whitespace stripped out.

use bufjson::{lexical::{Token, fixed::FixedAnalyzer}, syntax::Parser};

fn strip_whitespace(json_text: &str) {
    let mut parser = Parser::new(FixedAnalyzer::new(json_text.as_bytes()));
    loop {
        match parser.next_non_white() {
            Token::Eof => break,
            Token::Err => panic!("{}", parser.err()),
            _ => print!("{}", parser.content().literal()),
        };
    }
}

fn main() {
    // Prints `{"foo":"bar","baz":[null,123]}`
    strip_whitespace(r#"{ "foo": "bar", "baz": [null, 123] }"#);
}

Architecture

The bufjson crate provides a stream-oriented JSON tokenizer through the lexical::Analyzer trait, with these implementations:

  • FixedAnalyzer tokenizes fixed-size buffers;
  • ReadAnalyzer tokenizes sync input streams implementing io::Read; and
  • AsyncAnalyzer tokenizes async streams that yield byte buffers (COMING SOON-ISH);

The remainder of the library builds on the lexical analyzer trait.

  • The syntax module provides concrete stream-oriented parser types that can wrap any lexical analyzer.
  • The pointer module enables fast stream-oriented evaluation of JSON Pointers.

Refer to the API reference docs for more detail.

When to use

Choose bufjson when you need to:

  • Control and limit allocations or copying.
  • Process JSON text larger than available memory.
  • Extract specific values without parsing an entire JSON text.
  • Edit a stream of JSON tokens (add/remove/change values in the stream).
  • Access token content exactly as it appears in the JSON text (e.g. without unescaping strings).
  • Protect against malicious or degenerate inputs.
  • Implement custom parsing with precise behavior control.

Other libraries are more suitable for:

  • Deserializing JSON text straight into in-memory data structures (use serde_json or simd-json).
  • Serializing in-memory data structures to JSON (use serde_json).
  • Writing JSON text in a stream-oriented manner (use serde_json or json-writer).

Performance

  • Zero-copy string processing where possible.
  • Minimal allocations, which are explicit and optional wherever possible.
  • Streaming design handles arbitrarily long JSON text without loading into memory.
  • Suitable for high-throughput applications.

Benchmarks

The table below shows JSON text throughput benchmark results.1

Component .content() fetched Throughput
FixedAnalyzer Never 1 GiB/s
FixedAnalyzer Always 1 GiB/s
ReadAnalyzer2 Never 880 MiB/s
ReadAnalyzer2 Always 690 MiB/s
Parser + FixedAnalyzer Never 890 MiB/s
Parser + FixedAnalyzer Always 850 MiB/s

License