Expand description
A fast, efficient, and correct streaming JSON parser for Rust that minimizes copies and allocations.
The bufjson crate provides a very fast, resource-efficient, pull parser aimed at minimizing
allocator and memory pressure. The design emphasizes work avoidance, work deferral, and
generally giving the library user maximum control over the parsing process and the resources
it uses. To make these features possible, the crate API is lower level than that of other
crates, such as serde_json, which are more geared towards convenience in “typical” use cases.
Despite being low-level, the bufjson API is designed and documented with care and, hopefully,
very intuitive to work with.
The crate design enables use cases that require scalability, either due to high levels of concurrent processing (in a web server, for example); or due to working with enormous JSON texts. As a streaming library, it also enables making progress in parsing JSON text that may not be complete yet, due to I/O latency or other reasons.
§Features
- Streaming pull parser (lower level, does not “map” data into a data structure).
- Best in class speed, second only to
simd-json(but with more flexibility and features). - Minimizes allocations and data copying.
- Clear structured error messages with pinpoint locations.
- Fast streaming JSON pointer evaluation.
no_stdsupport.
§Optional features
Some bufjson features are off by default to balance enabling a powerful set of features out of
the box while reducing the amount of compiled code.
The following bufjson feature flags can be
turned on to enable specific use cases:
§no_std mode
For convenience, the default feature set includes std. To enable no_std mode disable std
by adding a dependency line like this in your Cargo.toml:
bufjson = { version = "1", default-features = false }Note that while std can easily be turned off, bufjson currently has a permanent dependency
on alloc.
§Advanced JSON parsing feature flags
- The
pipefeature powers zero-copy parsing of JSON text input from streams ofbytes::Bytesvalues. It enables thelexical::pipemodule along withPipeAnalyzerlexical analyzer. - The
readfeature unlocks low-to-zero copy parsing of JSON text provided by any input stream that is compliant with thestd::io::Readtrait. It enables thelexical::readmodule and theReadAnalyzerlexical analyzer. (You can also combine this feature withno_stdmode to read this style of stream without actually introducing a dependency onstd.) - The
numfeature turns on builtin methods for parsing Rust native integer and floating point values out oflexical::ContentandBufvalues. As well as being more convenient, thebufjsonbuiltin integer parsers are somewhat more efficient than the Rust core library; but thef64parser currently just forwards to Rust core’sf64::from_str. - The
num_extfeature, which implicitly turns onnum, adds native support for less commonplace number parsing use cases, such as parsingi128.
§JSON pointer feature flags
- The
pointerfeature activates streaming JSON Pointer evaluation by enabling thepointermodule. - The
ignore_casefeature opts into a very narrow, but useful, use case: case-insensitive JSON Pointer evaluation. This can be handy in scenarios such as backward-compatible parsing of JSON that was previously handled by a case-insensitive parser like GoLang standard library’sencoding/json.
§Stability
The bufjson crate follows SemVer. Breaking changes will only ever be
introduced in major versions, if at all. New additions to the API, such as new types, methods,
or traits will only be added in minor versions.
§Architecture
The bufjson crate has three main top-level modules: lexical, syntax, and
pointer. These modules form a simple layered architecture where each layer
builds on the previous one:
- Module
lexicalprovides a set of JSON lexical analysers, also known as scanners or tokenizers, optimized for specific input modes.lexical::fixed::FixedAnalyzeris used to tokenize JSON from a single fixed-length in-memory buffer;lexical::pipe::PipeAnalyzercan scan JSON from a stream ofbytes::Bytesvalues; andlexical::read::ReadAnalyzercan tokenize any stream that looks likestd::io::Read. - Module
syntaxprovides a JSON parser that works with any JSON tokenizer that satisfies thelexical::Analyzertrait. Most tokenizers will provide aninto_parser()convenience method to wrap themselves in a parser. - Module
pointerprovides a streaming JSON Pointer evaluator that can wrap asyntax::Parser. Since the parser can operate on any use-case optimized lexical analyzer, so can the JSON Pointer evaluator.
§Core types
Tokenis a simple unit-only enum that enumerates the kinds of JSON lexical tokens. It includes pseudo-tokens for error and end-of-file cases.lexical::Contentis a trait that represents the text content of a JSON token. For example, the content of aToken::ObjBeginis always{, while the content of aToken::Strmight be"","foo", or an infinity of other possibilities.lexical::Analyzeris a trait that represents the capability to tokenize a stream of JSON text.syntax::Parserwraps any lexical analyzer with the ability to parse JSON at the syntax level.Posrepresents an exact position within a JSON text.BufandIntoBufare special traits that allowbufjsonto give zero-copy, zero-allocation access to validated token content even if the token is not contiguous in memory (split across input buffers). The literal value of a token’s text content is alwaysIntoBuf, and some algorithms such asunescapeoperate onIntoBuf. You likely won’t notice these types when tokenizing withFixedAnalyzerbut they take on more importance when tokenizing withPipeAnalyzerorReadAnalyzer.
§Bonus features
The layered architecture of bufjson comes with some bonus features.
In particular, the lower-level state machines on which the lexical analyzers and the streaming
JSON Pointer evaluator are based are available in modules lexical::state and
pointer::state. You can use these state machines to build your own lower-level JSON
features, such as custom lexical analyzers.
§Nuances and design philosophy
There are a few nuances to working with this crate that may be unexpected if you’re coming to
bufjson from a different JSON parsing ecosystem. Since these nuances are driven by the
crate’s philosophy, we begin by listing some of the design priorities that drive them:
- Work should be avoided if possible and deferred if not.
- The library user should have maximum control to decide what work gets done, and when.
- Allocations and copies should be avoided.
- The input text should always be provided to the library user as-is, byte-for-byte, without modification.
- Emitting the stream of parsed JSON token content should exactly reproduce the input JSON text.
These design principles lead to the following nuanced consequences:
- Strings are quoted. If the JSON contains the string token
"foo", the token content returned is"foo"including the opening and closing double quotes. - Escape sequence expansion is deferred until you ask for it. There are several ways to do
this, including
Content::unescapedandunescape. - Numbers are recognized, but not interpreted. In other words, all
Token::Numtokens are lexically correct butbufjsondoes not try to convert them to Rust types for you. You can convert them trivially using thenumfeature. - Strings and numbers can have infinite length.
§Non-features
A deliberate choice has been made not to support the following features:
- JSON writing or serialization. Compared to the input side (parsing), the output side is
relatively trivial and is easy to do performantly. The Rust ecosystem is already well-served
by crates that solve this problem, and
bufjsonwould add nothing of value by providing its own me-too solution. You may find write-focused crates such asjson_in_typeorjson-writerwork well for you here.
§Contributing
Contributions are welcome! See the contributing guidelines in
CONTRIBUTING.md.
Re-exports§
pub use lexical::Token;
Modules§
- lexical
- Scan JSON text, extracting a stream of tokens (lexical analysis).
- pointer
pointer - Evaluate JSON Pointers against a stream of JSON text.
- syntax
- Parse the structural meaning of a stream of JSON text (syntactic analysis).
Structs§
- BufUnderflow
- Error returned when a
Bufdoes not have enough bytes remaining to satisfy a request. - Pos
- Position in an input buffer or stream.
- String
Buf - A
Bufimplementation forString.
Traits§
- Buf
- Valid UTF-8 sequence whose bytes may or may not be contiguous in memory.
- EqStr
- Trait for types that form an equivalence relation together with
str. - IntoBuf
- Conversion into a
Buf. - OrdStr
- Trait for types that form a total ordering together with
str. - Sink
- Accumulator for output bytes.