Crate lexical

Crate lexical 

Source
Expand description

Fast lexical conversion routines.

lexical-core is a high-performance library for number-to-string and string-to-number conversions. The writers require a system allocator, but support a no_std environment. In addition to high performance, it’s also highly configurable, supporting nearly every float and integer format available.

lexical is well-tested, and has been downloaded more than 25 million times and currently has no known errors in correctness. lexical prioritizes performance above all else, and is competitive or faster than any other float or integer parser and writer.

In addition, despite having a large number of features, configurability, and a focus on performance, it also aims to have fast compile times. Recent versions also add support for smaller binary sizes, as well ideal for embedded or web environments, where executable bloat can be much more detrimental than performance.

§Getting Started

§Parse API

The main parsing API is parse and parse_partial. For example, to parse a number from string, validating the entire input is a number:

let i: i32 = lexical::parse("3").unwrap();      // 3, auto-type deduction.
let f: f32 = lexical::parse("3.5").unwrap();    // 3.5
let d = lexical::parse::<f64, _>("3.5");        // Ok(3.5), successful parse.

All lexical parsers are validating, they check the that entire input data is correct, and stop parsing when invalid data is found, numerical overflow, or other errors:

let r = lexical::parse::<u8, _>("256"); // Err(ErrorCode::Overflow.into())
let r = lexical::parse::<u8, _>("1a5"); // Err(ErrorCode::InvalidDigit.into())

For streaming APIs or those incrementally parsing data fed to a parser, where the input data is known to be a float but where the float ends is currently unknown, the partial parsers will both return the data it was able to parse and the number of bytes processed:

let r = lexical::parse_partial::<i8, _>("3a5"); // Ok((3, 1))
§Write API

The main parsing API is to_string. For example, to write a number to string:

let value = lexical::to_string(15.1);
assert_eq!(value, "15.1");

§Conversion API

This writes and parses numbers to and from a format identical to Rust’s parse and write.

  • to_string: Write a number to string.
  • parse: Parse a number from string validating the complete string is a number.
  • parse_partial: Parse a number from string returning the number and the number of digits it was able to parse.
// parse
let f: f64 = lexical::parse(b"3.5").unwrap();
assert_eq!(f, 3.5);

let (f, count): (f64, usize) = lexical::parse_partial(b"3.5").unwrap();
assert_eq!(f, 3.5);
assert_eq!(count, 3);

// write
let value = lexical::to_string(f);
assert_eq!(value, "3.5");

§Options/Formatting API

Each number parser and writer contains extensive formatting control through options and format specifications, including digit separator support (that is, numbers such as 1_2__3.4_5), if integral, fractional, or any significant digits are required, if to disable parsing or writing of non-finite values, if + signs are invalid or required, and much more.

  • to_string_with_options: Write a number to string using custom formatting options.
  • parse_with_options: Parse a number from string using custom formatting options, validating the complete string is a number.
  • parse_partial_with_options: Parse a number from string using custom formatting options, returning the number and the number of digits it was able to parse.

Some options, such as custom string representations of non-finite floats (such as NaN), are available without the format feature. For more comprehensive examples, see the format and Comprehensive Configuration sections below.

use lexical::{format, parse_float_options, write_float_options};

// parse
let f: f64 = lexical::parse_with_options::<_, _, { format::JSON }>(
    "3.5",
    &parse_float_options::JSON
).unwrap();

// write
let value = lexical::to_string_with_options::<_, { format::JSON }>(
    f,
    &write_float_options::JSON
);
assert_eq!(value, "3.5");

§Features

In accordance with the Rust ethos, all features are additive: the crate may be build with --all-features without issue. The following features are enabled by default:

  • write-integers (Default) - Enable writing of integers.
  • write-floats (Default) - Enable writing of floats.
  • parse-integers (Default) - Enable parsing of integers.
  • parse-floats (Default) - Enable parsing of floats.
  • radix - Add support for strings of any radix.
  • compact - Reduce code size at the cost of performance.
  • format - Add support for custom number formatting.
  • f16 - Enable support for half-precision f16 and bf16 floats.
  • std (Default) - Disable to allow use in a no_std environment.

A complete description of supported features includes:

§write-integers

Enable support for writing integers to string.

let value = lexical::to_string(1234u64);
assert_eq!(value, "1234");
§write-floats

Enable support for writing floating-point numbers to string.

let value = lexical::to_string(1.234f64);
assert_eq!(value, "1.234");
§parse-integers

Enable support for parsing integers from string.

let f: i64 = lexical::parse("1234").unwrap();
assert_eq!(f, 1234);
§parsing-floats

Enable support for parsing floating-point numbers from string.

let f: f64 = lexical::parse("1.234").unwrap();
assert_eq!(f, 1.234);
§format

Adds support for the entire format API (using NumberFormatBuilder). This allows extensive configurability for parsing and writing numbers in custom formats, with different valid syntax requirements.

For example, in JSON, the following floats are valid or invalid:

-1          // valid
+1          // invalid
1           // valid
1.          // invalid
.1          // invalid
0.1         // valid
nan         // invalid
inf         // invalid
Infinity    // invalid

All of the finite numbers are valid in Rust, and Rust provides constants for non-finite floats. In order to parse standard-conforming JSON floats using lexical, you may use the following approach:

use lexical::{format, parse_with_options, ParseFloatOptions, Result};

fn parse_json_float<Bytes: AsRef<[u8]>>(bytes: Bytes) -> Result<f64> {
    const OPTIONS: ParseFloatOptions = ParseFloatOptions::new();
    parse_with_options::<_, _, { format::JSON }>(bytes.as_ref(), &OPTIONS)
}

Enabling the format API significantly increases compile times, however, it enables a large amount of customization in how floats are written.

§power-of-two

Enable doing numeric conversions to and from strings radixes that are powers of two, that is, 2, 4, 8, 16, and 32. This avoids most of the overhead and binary bloat of the radix feature, while enabling support for the most commonly-used radixes.

use lexical::{
    ParseFloatOptions,
    WriteFloatOptions,
    NumberFormatBuilder
};

// parse
const BINARY: u128 = NumberFormatBuilder::binary();
let value = "1.0011101111100111011011001000101101000011100101011";
let f: f64 = lexical::parse_with_options::<_, _, { BINARY }>(
    value,
    &ParseFloatOptions::new()
).unwrap();

// write
let result = lexical::to_string_with_options::<_, { BINARY }>(
    f,
    &WriteFloatOptions::new()
);
assert_eq!(result, value);
§radix

Enable doing numeric conversions to and from strings for all radixes. This requires more static storage than power-of-two, and increases compile times, but can be quite useful for esoteric programming languages which use duodecimal floats, for example.

use lexical::{
    ParseFloatOptions,
    WriteFloatOptions,
    NumberFormatBuilder
};

// parse
const FORMAT: u128 = NumberFormatBuilder::from_radix(12);
let value = "1.29842830A44BAA2";
let f: f64 = lexical::parse_with_options::<_, _, { FORMAT }>(
    value,
    &ParseFloatOptions::new()
).unwrap();

// write
let result = lexical::to_string_with_options::<_, { FORMAT }>(
    f,
    &WriteFloatOptions::new()
);
assert_eq!(result, value);
§compact

Reduce the generated code size at the cost of performance. This minimizes the number of static tables, inlining, and generics used, drastically reducing the size of the generated binaries.

§std

Enable use of the standard library. Currently, the standard library is not used, and may be disabled without any change in functionality on stable.

§Comprehensive Configuration

lexical provides two main levels of configuration:

  • The NumberFormatBuilder, creating a packed struct with custom formatting options.
  • The Options API.

§Number Format

The number format class provides numerous flags to specify number parsing or writing. When the power-of-two feature is enabled, additional flags are added:

  • The radix for the significant digits (default 10).
  • The radix for the exponent base (default 10).
  • The radix for the exponent digits (default 10).

When the format feature is enabled, numerous other syntax and digit separator flags are enabled, including:

  • A digit separator character, to group digits for increased legibility.
  • Whether leading, trailing, internal, and consecutive digit separators are allowed.
  • Toggling required float components, such as digits before the decimal point.
  • Toggling whether special floats are allowed or are case-sensitive.

Many pre-defined constants therefore exist to simplify common use-cases, including:

For a list of all supported fields, see Fields.

§Options API

The Options API provides high-level options to specify number parsing or writing, options not intrinsically tied to a number format. For example, the Options API provides:

  • The exponent character (defaults to b'e' or b'^', depending on the radix).
  • The decimal point character (defaults to b'.').
  • Custom NaN and Infinity string representations.
  • Whether to trim the fraction component from integral floats.
  • The exponent break-point for scientific notation.
  • The maximum and minimum number of significant digits to write.
  • The rounding mode when truncating significant digits while writing.

The available options are:

In addition, pre-defined constants for each category of options may be found in their respective modules, for example, JSON.

§Examples

An example of creating your own options to parse European-style numbers (which use commas as decimal points, and periods as digit separators) is as follows:

// This creates a format to parse a European-style float number.
// The decimal point is a comma, and the digit separators (optional)
// are periods.
const EUROPEAN: u128 = lexical::NumberFormatBuilder::new()
    .digit_separator(num::NonZeroU8::new(b'.'))
    .build_strict();
const COMMA_OPTIONS: lexical::ParseFloatOptions = lexical::ParseFloatOptions::builder()
    .decimal_point(b',')
    .build_strict();
assert_eq!(
    lexical::parse_with_options::<f32, _, EUROPEAN>("300,10", &COMMA_OPTIONS),
    Ok(300.10)
);

// Another example, using a pre-defined constant for JSON.
const JSON: u128 = lexical::format::JSON;
const JSON_OPTIONS: lexical::ParseFloatOptions = lexical::ParseFloatOptions::new();
assert_eq!(
    lexical::parse_with_options::<f32, _, JSON>("0e1", &JSON_OPTIONS),
    Ok(0.0)
);
assert_eq!(
    lexical::parse_with_options::<f32, _, JSON>("1E+2", &JSON_OPTIONS),
    Ok(100.0)
);

§Version Support

The minimum, standard, required version is 1.63.0, for const generic support. Older versions of lexical support older Rust versions.

§Algorithms

§Benchmarks

A comprehensive analysis of lexical commits and their performance can be found in benchmarks.

§Design

§Safety

There is no non-trivial unsafe behavior in lexical itself, however, any incorrect safety invariants in our parsers and writers (lexical-parse-float, lexical-parse-integer, lexical-write-float, and lexical-write-integer) could cause those safety invariants to be broken.

Modules§

format
The creation and processing of number format packed structs.
parse_float_optionsparse-floats
Configuration options for parsing floats.
parse_integer_optionsparse-integers
Configuration options for parsing integers.
write_float_optionswrite-floats
Configuration options for writing floats.
write_integer_optionswrite-integers
Configuration options for writing integers.

Structs§

NumberFormat
Helper to access features from the packed format struct.
NumberFormatBuilder
Validating builder for NumberFormat from the provided specifications.
ParseFloatOptionsparse-floats
Options to customize parsing floats.
ParseFloatOptionsBuilderparse-floats
Builder for Options.
ParseIntegerOptionsparse-integers
Options to customize the parsing integers.
ParseIntegerOptionsBuilderparse-integers
Builder for Options.
WriteFloatOptionswrite-floats
Options to customize writing floats.
WriteFloatOptionsBuilderwrite-floats
Builder for Options.
WriteIntegerOptionswrite-integers
Immutable options to customize writing integers.
WriteIntegerOptionsBuilderwrite-integers
Builder for Options.
bf16f16
A 16-bit floating point type implementing the bfloat16 format.
f16f16
A 16-bit floating point type implementing the IEEE 754-2008 standard binary16 a.k.a “half” format.

Enums§

Error
Error code during parsing, indicating failure type.

Constants§

BUFFER_SIZEwrite-floats or write-integers
Maximum number of bytes required to serialize any number with default options to string.

Traits§

FormattedSizewrite-floats or write-integers
The size, in bytes, of formatted values.
FromLexicalparse-floats or parse-integers
Trait for numerical types that can be parsed from bytes.
FromLexicalWithOptionsparse-floats or parse-integers
Trait for numerical types that can be parsed from bytes with custom options.
ParseOptionsparse-floats or parse-integers
Shared trait for all parser options.
ToLexicalwrite-floats or write-integers
Trait for numerical types that can be serialized to bytes.
ToLexicalWithOptionswrite-floats or write-integers
Trait for numerical types that can be serialized to bytes with custom options.
WriteOptionswrite-floats or write-integers
Shared trait for all writer options.

Functions§

format_error
Get the error type from the format packed struct.
format_is_valid
Determine if the format packed struct is valid.
parseparse-floats or parse-integers
High-level conversion of decimal-encoded bytes to a number.
parse_partialparse-floats or parse-integers
High-level, partial conversion of decimal-encoded bytes to a number.
parse_partial_with_optionsparse-floats or parse-integers
High-level, partial conversion of bytes to a number with custom parsing options.
parse_with_optionsparse-floats or parse-integers
High-level conversion of bytes to a number with custom parsing options.
to_stringwrite-floats or write-integers
High-level conversion of a number to a decimal-encoded string.
to_string_with_optionswrite-floats or write-integers
High-level conversion of a number to a string with custom writing options.

Type Aliases§

Result
A specialized Result type for lexical operations.