Crate lexical

source ·
Expand description

Fast lexical conversion routines.

Fast lexical conversion routines for both std and no_std environments. lexical provides routines to convert numbers to and from decimal strings. lexical also supports non-base 10 numbers, with the radix feature, for both integers and floats. lexical is customizable and yet simple to use: despite supporting nearly every float and integer format available, it only exports 2 write functions and 4 parse functions.

lexical is well-tested, and has been downloaded more than 5 million times and currently has no known errors in correctness. lexical prioritizes performance above all else, and aims to be competitive or faster than any other float or integer parser and writer.

§Getting Started

// Number to string
lexical::to_string(3.0);            // "3.0", always has a fraction suffix.
lexical::to_string(3);              // "3"

// String to number.
let i: i32 = lexical::parse("3").unwrap();      // 3, auto-type deduction.
let f: f32 = lexical::parse("3.5").unwrap();    // 3.5
let d = lexical::parse::<f64, _>("3.5");        // Ok(3.5), successful parse.
let d = lexical::parse::<f64, _>("3a");         // Err(Error(_)), failed to parse.

§Conversion API

To String

From String

§Features

In accordance with the Rust ethos, all features are additive: the crate may be build with --all-features without issue. The following features are enabled by default:

  • std
  • write-integers
  • write-floats
  • parse-integers
  • parse-floats

A complete description of supported features includes:

§std

Enable use of the standard library. Currently, the standard library is not used for any functionality, and may be disabled without any change in functionality on stable.

§write-integers

Enable support for writing integers to string.

§write-floats

Enable support for writing floating-point numbers to string.

§parse-integers

Enable support for parsing integers from string.

§parsing-floats

Enable support for parsing floating-point numbers from string.

§format

Adds support for the entire format API (using NumberFormatBuilder). This allows extensive configurability for parsing and writing numbers in custom formats, with different valid syntax requirements.

For example, in JSON, the following floats are valid or invalid:

-1          // valid
+1          // invalid
1           // valid
1.          // invalid
.1          // invalid
0.1         // valid
nan         // invalid
inf         // invalid
Infinity    // invalid

All of the finite numbers are valid in Rust, and Rust provides constants for non-finite floats. In order to parse standard-conforming JSON floats using lexical, you may use the following approach:

use lexical_core::{format, parse_with_options, ParseFloatOptions, Result};

fn parse_json_float<Bytes: AsRef<[u8]>>(bytes: Bytes) -> Result<f64> {
    let options = ParseFloatOptions::new();
    parse_with_options::<_, { format::JSON }>(bytes.as_ref(), &options)
}

See the Number Format section below for more information.

§power-of-two

Enable doing numeric conversions to and from strings with power-of-two radixes. This avoids most of the overhead and binary bloat of the radix feature, while enabling support for the most commonly-used radixes.

§radix

Enable doing numeric conversions to and from strings for all radixes. This requires substantially more static storage than power-of-two, and increases compile times by a fair amount, but can be quite useful for esoteric programming languages which use duodecimal floats, for example.

§compact

Reduce the generated code size at the cost of performance. This minimizes the number of static tables, inlining, and generics used, drastically reducing the size of the generated binaries.

§safe

This replaces most unchecked indexing, required in cases where the compiler cannot ellide the check, with checked indexing. However, it does not fully replace all unsafe behavior with safe behavior. To minimize the risk of UB and out-of-bounds reads/writers, extensive edge-cases, property-based tests, and fuzzing is done with both the safe feature enabled and disabled, with the tests verified by Miri and Valgrind.

§Configuration API

Lexical provides two main levels of configuration:

  • The NumberFormatBuilder, creating a packed struct with custom formatting options.
  • The Options API.

§Number Format

The number format class provides numerous flags to specify number parsing or writing. When the power-of-two feature is enabled, additional flags are added:

  • The radix for the significant digits (default 10).
  • The radix for the exponent base (default 10).
  • The radix for the exponent digits (default 10).

When the format feature is enabled, numerous other syntax and digit separator flags are enabled, including:

  • A digit separator character, to group digits for increased legibility.
  • Whether leading, trailing, internal, and consecutive digit separators are allowed.
  • Toggling required float components, such as digits before the decimal point.
  • Toggling whether special floats are allowed or are case-sensitive.

Many pre-defined constants therefore exist to simplify common use-cases, including:

  • JSON, XML, TOML, YAML, SQLite, and many more.
  • Rust, Python, C#, FORTRAN, COBOL literals and strings, and many more.

§Options API

The Options API provides high-level options to specify number parsing or writing, options not intrinsically tied to a number format. For example, the Options API provides:

  • The exponent character (default b'e', or b'^').
  • The decimal point character (default b'.').
  • Custom NaN, Infinity string representations.
  • Whether to trim the fraction component from integral floats.
  • The exponent break point for scientific notation.
  • The maximum and minimum number of significant digits to write.
  • The rounding mode when truncating significant digits while writing.

The available options are:

In addition, pre-defined constants for each category of options may be found in their respective modules.

§Example

An example of creating your own options to parse European-style numbers (which use commas as decimal points, and periods as digit separators) is as follows:

// This creates a format to parse a European-style float number.
// The decimal point is a comma, and the digit separators (optional)
// are periods.
const EUROPEAN: u128 = lexical::NumberFormatBuilder::new()
    .digit_separator(b'.')
    .build()
    .unwrap();
let options = lexical_core::ParseFloatOptions::builder()
    .decimal_point(b',')
    .build()
    .unwrap();
assert_eq!(
    lexical::parse_with_options::<f32, EUROPEAN, _>("300,10", &options),
    Ok(300.10)
);

// Another example, using a pre-defined constant for JSON.
const JSON: u128 = lexical::format::JSON;
let options = lexical::ParseFloatOptions::new();
assert_eq!(
    lexical::parse_with_options::<f32, JSON, _>("0e1", &options),
    Ok(0.0)
);
assert_eq!(
    lexical::parse_with_options::<f32, JSON, _>("1E+2", &options),
    Ok(100.0)
);

§Version Support

The minimum, standard, required version is 1.63.0, for const generic support. Older versions of lexical support older Rust versions.

§Safety

There is no non-trivial unsafe behavior in lexical itself, however, any incorrect safety invariants in our parsers and writers (lexical-parse-float, lexical-parse-integer, lexical-write-float, and lexical-write-integer) could cause those safety invariants to be broken.

Modules§

Structs§

Enums§

  • Error code during parsing, indicating failure type.

Constants§

  • Maximum number of bytes required to serialize any number to string.

Traits§

Functions§

  • Get the error type from the format packed struct.
  • Determine if the format packed struct is valid.
  • High-level conversion of decimal-encoded bytes to a number.
  • High-level, partial conversion of decimal-encoded bytes to a number.
  • High-level, partial conversion of bytes to a number with custom parsing options.
  • High-level conversion of bytes to a number with custom parsing options.
  • High-level conversion of a number to a decimal-encoded string.
  • High-level conversion of a number to a string with custom writing options.

Type Aliases§

  • A specialized Result type for lexical operations.