Module lexical

Module lexical 

Source
Expand description

Scan JSON text, extracting a stream of tokens (lexical analysis).

This module provides the traits, helpers, and type definitions needed to perform stream-oriented lexical analysis on JSON text.

The fundamental types are the enum Token, which represents the type of a JSON token, and the traits Analyzer (does the lexical analysis); Content (efficiently provides the actual content of a token from the JSON text; and Error (describes errors encountered by the lexical analyzer).

The sub-modules provide concrete implementations of JSON tokenizers:

  • state is a lower-level module containing a simple reusable finite state machine; all the concrete lexical analyzers in this crate use this state machine for their core logic.
  • fixed contains an implementation of Analyzer for tokenizing fixed-size in-memory buffers.

§Performance

Performance characteristics are documented on all relevant types at the trait level (this module) and at the concrete implementation level (in the sub-modules).

In all cases, allocations and copies are avoided except where it is technically infeasible. When they have to be done, they are minimized.

§Token content

By design, the Content trait provides the literal text of all tokens appearing in the input JSON, including whitespace, without any change whatsoever. This policy facilitates use cases such as stream editing, where you might want to make changes to the JSON text, such as deleting some JSON elements or inserting new ones, while leaving everything else unchanged.

§Numbers

For number tokens (Token::Num), the Content trait provides the literal content of the number as it appears in the JSON text, without attempting to coerce it into a Rust numeric type.

The reason for leaving numbers as text is that the JSON spec places no limits on the range and precision of numbers [1]. Since this module aims to faithfully implement the spec at the lexical level, it will recognize any valid JSON number, no matter the magnitude or precision. This would not be possible if it coerced the text into a numeric type, which all have their own limits on range and precision.

[1]: The spec does urge software developers using JSON to be thoughtful bout interoperability and, kinda sorta, to just stay within the IEEE double-precision floating point range, a.k.a., f64. But that’s not a requirement.

§Strings

For string tokens (Token::Str), the Content trait provides the literal content of the string as it appears in the JSON text, including the quotation marks that surround it, without attempting to expand the escape sequences.

Escape sequences can be expanded by explicitly requesting Content::unescaped instead of Content::literal. Note that getting the unescaped content, will trigger an allocation if the string indeed does contain at least one escape sequence, which may not be desirable in all circumstances.

Example of a string token without any escape sequences.

let mut lexer = FixedAnalyzer::new(&br#""foo""#[..]);
assert_eq!(Token::Str, lexer.next());
assert_eq!(r#""foo""#, lexer.content().literal()); // Note the surrounding quotes.
assert_eq!(r#""foo""#, lexer.content().unescaped()); // No allocation, returns same value.

Example of a string token containing an escape sequence.

let mut lexer = FixedAnalyzer::new(&br#""foo\u0020bar""#[..]);
assert_eq!(Token::Str, lexer.next());
assert_eq!(r#""foo\u0020bar""#, lexer.content().literal()); // Note the surrounding quotes.
assert_eq!(r#""foo bar""#, lexer.content().unescaped()); // Allocates, expands \u0020 -> ' '.

§Roll your own lexer

The sub-module state provides the basic state machine for tokenizing JSON text. You can use it to build your own implementation of Analyzer or any other application that needs a low-level ability to identify JSON tokens that is faithful to the JSON spec.

Modules§

fixed
Convert a fixed-size in-memory buffer into a stream of JSON lexical tokens.
state
Simple state machine for lexical analysis of JSON text.

Enums§

ErrorKind
Category of error that can occur while tokenizing a JSON text.
Expect
Character or class of characters expected at the next input position of a JSON text.
Token
Type of lexical token in a JSON text, such as begin object {, literal true, or string.

Traits§

Analyzer
Lexical analyzer for JSON text.
Content
Text content of a JSON token.
Error
An error encountered during lexical analysis of JSON text.