Crate lexical_core

Source
Expand description

Fast lexical conversion routines for a no_std environment.

lexical-core is a low-level API for number-to-string and string-to-number conversions, without requiring a system allocator. If you would like to use a high-level API that writes to and parses from String and &str, respectively, please look at lexical instead.

Despite the low-level API and focus on performance, lexical-core strives to be simple and yet configurable: despite supporting nearly every float and integer format available, it only exports 4 write functions and 4 parse functions.

lexical-core is well-tested, and has been downloaded more than 5 million times and currently has no known errors in correctness. lexical-core prioritizes performance above all else, and aims to be competitive or faster than any other float or integer parser and writer.

In addition, despite having a large number of features, configurability, and a focus on performance, we also strive for fast compile times. Recent versions also add support for smaller binary sizes, as well ideal for embedded or web environments, where executable bloat can be much more detrimental than performance.

§Getting Started


// String to number using Rust slices.
// The argument is the byte string parsed.
let f: f32 = lexical_core::parse(b"3.5").unwrap();   // 3.5
let i: i32 = lexical_core::parse(b"15").unwrap();    // 15

// All lexical_core parsers are checked, they validate the
// input data is entirely correct, and stop parsing when invalid data
// is found, or upon numerical overflow.
let r = lexical_core::parse::<u8>(b"256"); // Err(ErrorCode::Overflow.into())
let r = lexical_core::parse::<u8>(b"1a5"); // Err(ErrorCode::InvalidDigit.into())

// In order to extract and parse a number from a substring of the input
// data, use `parse_partial`. These functions return the parsed value and
// the number of processed digits, allowing you to extract and parse the
// number in a single pass.
let r = lexical_core::parse_partial::<i8>(b"3a5"); // Ok((3, 1))

// If an insufficiently long buffer is passed, the serializer will panic.

// PANICS
let mut buf = [b'0'; 1];
//let slc = lexical_core::write::<i64>(15, &mut buf);

// In order to guarantee the buffer is long enough, always ensure there
// are at least `T::FORMATTED_SIZE` bytes, which requires the
// `lexical_core::FormattedSize` trait to be in scope.
use lexical_core::FormattedSize;
let mut buf = [b'0'; f64::FORMATTED_SIZE];
let slc = lexical_core::write::<f64>(15.1, &mut buf);
assert_eq!(slc, b"15.1");

// When the `radix` feature is enabled, for decimal floats, using
// `T::FORMATTED_SIZE` may significantly overestimate the space
// required to format the number. Therefore, the
// `T::FORMATTED_SIZE_DECIMAL` constants allow you to get a much
// tighter bound on the space required.
let mut buf = [b'0'; f64::FORMATTED_SIZE_DECIMAL];
let slc = lexical_core::write::<f64>(15.1, &mut buf);
assert_eq!(slc, b"15.1");

§Conversion API

Write

From String

§Features

In accordance with the Rust ethos, all features are additive: the crate may be build with --all-features without issue. The following features are enabled by default:

  • std
  • write-integers
  • write-floats
  • parse-integers
  • parse-floats

A complete description of supported features includes:

§std

Enable use of the standard library. Currently, the standard library is not used for any functionality, and may be disabled without any change in functionality on stable.

§write-integers

Enable support for writing integers to string.

§write-floats

Enable support for writing floating-point numbers to string.

§parse-integers

Enable support for parsing integers from string.

§parsing-floats

Enable support for parsing floating-point numbers from string.

§format

Adds support for the entire format API (using NumberFormatBuilder). This allows extensive configurability for parsing and writing numbers in custom formats, with different valid syntax requirements.

For example, in JSON, the following floats are valid or invalid:

-1          // valid
+1          // invalid
1           // valid
1.          // invalid
.1          // invalid
0.1         // valid
nan         // invalid
inf         // invalid
Infinity    // invalid

All of the finite numbers are valid in Rust, and Rust provides constants for non-finite floats. In order to parse standard-conforming JSON floats using lexical, you may use the following approach:

use lexical_core::{format, parse_with_options, ParseFloatOptions, Result};

fn parse_json_float<Bytes: AsRef<[u8]>>(bytes: Bytes) -> Result<f64> {
    let options = ParseFloatOptions::new();
    parse_with_options::<_, { format::JSON }>(bytes.as_ref(), &options)
}

See the Number Format section below for more information.

§power-of-two

Enable doing numeric conversions to and from strings with power-of-two radixes. This avoids most of the overhead and binary bloat of the radix feature, while enabling support for the most commonly-used radixes.

§radix

Enable doing numeric conversions to and from strings for all radixes. This requires substantially more static storage than power-of-two, and increases compile times by a fair amount, but can be quite useful for esoteric programming languages which use duodecimal floats, for example.

§compact

Reduce the generated code size at the cost of performance. This minimizes the number of static tables, inlining, and generics used, drastically reducing the size of the generated binaries.

§safe

This replaces most unchecked indexing, required in cases where the compiler cannot elide the check, with checked indexing. However, it does not fully replace all unsafe behavior with safe behavior. To minimize the risk of undefined behavior and out-of-bounds reads/writers, extensive edge-cases, property-based tests, and fuzzing is done with both the safe feature enabled and disabled, with the tests verified by Miri and Valgrind.

§Configuration API

Lexical provides two main levels of configuration:

  • The NumberFormatBuilder, creating a packed struct with custom formatting options.
  • The Options API.

§Number Format

The number format class provides numerous flags to specify number parsing or writing. When the power-of-two feature is enabled, additional flags are added:

  • The radix for the significant digits (default 10).
  • The radix for the exponent base (default 10).
  • The radix for the exponent digits (default 10).

When the format feature is enabled, numerous other syntax and digit separator flags are enabled, including:

  • A digit separator character, to group digits for increased legibility.
  • Whether leading, trailing, internal, and consecutive digit separators are allowed.
  • Toggling required float components, such as digits before the decimal point.
  • Toggling whether special floats are allowed or are case-sensitive.

Many pre-defined constants therefore exist to simplify common use-cases, including:

  • JSON, XML, TOML, YAML, SQLite, and many more.
  • Rust, Python, C#, FORTRAN, COBOL literals and strings, and many more.

§Options API

The Options API provides high-level options to specify number parsing or writing, options not intrinsically tied to a number format. For example, the Options API provides:

  • The exponent character (default b'e', or b'^').
  • The decimal point character (default b'.').
  • Custom NaN, Infinity string representations.
  • Whether to trim the fraction component from integral floats.
  • The exponent break point for scientific notation.
  • The maximum and minimum number of significant digits to write.
  • The rounding mode when truncating significant digits while writing.

The available options are:

In addition, pre-defined constants for each category of options may be found in their respective modules.

§Example

An example of creating your own options to parse European-style numbers (which use commas as decimal points, and periods as digit separators) is as follows:

// This creates a format to parse a European-style float number.
// The decimal point is a comma, and the digit separators (optional)
// are periods.
const EUROPEAN: u128 = lexical_core::NumberFormatBuilder::new()
    .digit_separator(b'.')
    .build();
let options = lexical_core::ParseFloatOptions::builder()
    .decimal_point(b',')
    .build()
    .unwrap();
assert_eq!(
    lexical_core::parse_with_options::<f32, EUROPEAN>(b"300,10", &options),
    Ok(300.10)
);

// Another example, using a pre-defined constant for JSON.
const JSON: u128 = lexical_core::format::JSON;
let options = lexical_core::ParseFloatOptions::new();
assert_eq!(
    lexical_core::parse_with_options::<f32, JSON>(b"0e1", &options),
    Ok(0.0)
);
assert_eq!(
    lexical_core::parse_with_options::<f32, JSON>(b"1E+2", &options),
    Ok(100.0)
);

§Algorithms

§Benchmarks

A comprehensive analysis of lexical commits and their performance can be found in benchmarks.

§Design

§Version Support

The minimum, standard, required version is 1.63.0, for const generic support. Older versions of lexical support older Rust versions.

§Safety

There is no non-trivial unsafe behavior in lexical itself, however, any incorrect safety invariants in our parsers and writers (lexical-parse-float, lexical-parse-integer, lexical-write-float, and lexical-write-integer) could cause those safety invariants to be broken.

Modules§

Structs§

Enums§

  • Error code during parsing, indicating failure type.

Constants§

  • Maximum number of bytes required to serialize any number to string.

Traits§

Functions§

Type Aliases§

  • A specialized Result type for lexical operations.