Crate lexical_parse_integer

Crate lexical_parse_integer 

Source
Expand description

Fast lexical string-to-integer conversion routines.

This contains high-performance methods to parse integers from bytes. Using from_lexical is analogous to parse, while enabling parsing from bytes as well as str.

§Getting Started

To parse a number from bytes, use from_lexical:

use lexical_parse_integer::{Error, FromLexical};

let value = u64::from_lexical("1234".as_bytes());
assert_eq!(value, Ok(1234));

let value = u64::from_lexical("18446744073709551616".as_bytes());
assert_eq!(value, Err(Error::Overflow(19)));

let value = u64::from_lexical("1234 }, {\"Key\", \"Value\"}}".as_bytes());
assert_eq!(value, Err(Error::InvalidDigit(4)));

If wishing to incrementally parse a string from bytes, that is, parse as many characters until an invalid digit is found, you can use the partial parsers. This is useful in parsing data where the type is known, such as JSON, but where the end of the number is not yet known.

use lexical_parse_integer::{Error, FromLexical};

let value = u64::from_lexical_partial("1234 }, {\"Key\", \"Value\"}}".as_bytes());
assert_eq!(value, Ok((1234, 4)));

let value = u64::from_lexical_partial("18446744073709551616 }, {\"Key\", \"Value\"}}".as_bytes());
assert_eq!(value, Err(Error::Overflow(19)));

§Options/Formatting API

Each integer parser contains extensive formatting control through format, particularly digit separator support (that is, integers such as 1_2__3). For options, we have custom formats optimized for both small and large integers.

To optimize for smaller integers at the expense of performance of larger ones, you can use OptionsBuilder::no_multi_digit (defaults to true).

use lexical_parse_integer::{options, NumberFormatBuilder, FromLexicalWithOptions};

const FORMAT: u128 = NumberFormatBuilder::new().build_strict();

// a bit faster
let value = u64::from_lexical_with_options::<FORMAT>(b"12", &options::SMALL_NUMBERS);
assert_eq!(value, Ok(12));

// a lot slower
let value = u64::from_lexical_with_options::<FORMAT>(b"18446744073709551615", &options::SMALL_NUMBERS);
assert_eq!(value, Ok(0xffffffffffffffff));

§Features

  • format - Add support for parsing custom integer formats.
  • power-of-two - Add support for parsing power-of-two integer strings.
  • radix - Add support for strings of any radix.
  • compact - Reduce code size at the cost of performance.
  • std (Default) - Disable to allow use in a no_std environment.

A complete description of supported features includes:

§format

Add support custom float formatting specifications. This should be used in conjunction with Options for extensible integer parsing. This allows changing the use of digit separators, requiring or not allowing signs, and more.

§JSON

For example, in JSON, the following integers are valid or invalid:

-1          // valid
+1          // invalid
1           // valid

All of these are valid in our default format (the format of Rust strings), so we must use a custom format to parse JSON strings:

use lexical_parse_integer::{format, Error, FromLexicalWithOptions, Options};

const OPTIONS: Options = Options::new();
let value = u64::from_lexical_with_options::<{ format::JSON }>("1234".as_bytes(), &OPTIONS);
assert_eq!(value, Ok(1234));

let value = u64::from_lexical_with_options::<{ format::JSON }>("+1234".as_bytes(), &OPTIONS);
assert_eq!(value, Err(Error::InvalidPositiveSign(0)));
§Custom Format

An example of building a custom format to with digit separator support is:

use lexical_parse_integer::{NumberFormatBuilder, Options, FromLexicalWithOptions};

const FORMAT: u128 = NumberFormatBuilder::new()
    // require that a `+` or `-` preceeds the number
    .required_mantissa_sign(true)
    // allow internal digit separators, that is, a special character between digits
    .integer_internal_digit_separator(true)
    // use `_` as the digit separator
    .digit_separator(num::NonZeroU8::new(b'_'))
    // allow an optional `0d` prefix to the number
    .base_prefix(num::NonZeroU8::new(b'd'))
    // build the number format, panicking on error
    .build_strict();
const OPTIONS: Options = Options::new();

let value = u64::from_lexical_with_options::<FORMAT>("+12_3_4".as_bytes(), &OPTIONS);
assert_eq!(value, Ok(1234));

let value = u64::from_lexical_with_options::<FORMAT>("+0d12_3_4".as_bytes(), &OPTIONS);
assert_eq!(value, Ok(1234));

For a list of all supported fields, see Parse Integer Fields.

Enabling the format API significantly increases compile times, however, it enables a large amount of customization in how integers are parsed.

§power-of-two

Enable parsing numbers that are powers of two, that is, 2, 4, 8, 16, and 32.

use lexical_parse_integer::{FromLexicalWithOptions, NumberFormatBuilder, Options};

const BINARY: u128 = NumberFormatBuilder::binary();
const OPTIONS: Options = Options::new();
let value = u64::from_lexical_with_options::<BINARY>("10011010010".as_bytes(), &OPTIONS);
assert_eq!(value, Ok(1234));
§radix

Enable parsing numbers using all radixes from 2 to 36. This requires more static storage than power-of-two, and increases compile times, but can be quite useful for esoteric programming languages which use duodecimal integers.

use lexical_parse_integer::{FromLexicalWithOptions, NumberFormatBuilder, Options};

const BINARY: u128 = NumberFormatBuilder::from_radix(12);
const OPTIONS: Options = Options::new();
let value = u64::from_lexical_with_options::<BINARY>("86A".as_bytes(), &OPTIONS);
assert_eq!(value, Ok(1234));
§compact

Reduce the generated code size at the cost of performance. This minimizes the number of static tables, inlining, and generics used, drastically reducing the size of the generated binaries. However, this resulting performance of the generated code is much lower.

§std

Enable use of the standard library. Currently, the standard library is not used, and may be disabled without any change in functionality on stable.

§Higher-Level APIs

If you would like an API that supports multiple numeric conversions rather than just writing integers, use lexical or lexical-core instead.

§Version Support

The minimum, standard, required version is 1.63.0, for const generic support. Older versions of lexical support older Rust versions.

§Algorithm

The default implementations are highly optimized both for simple strings, as well as input with large numbers of digits. In order to keep performance optimal for simple strings, we avoid overly branching to minimize the number of branches (and therefore optimization checks). Most of the branches in the code are resolved at compile-time, and the resulting ASM is monitored to ensure there are no regressions. For larger strings, a limited number of optimization checks are included to try faster, multi-digit parsing algorithms. For 32-bit integers, we try to parse 4 digits at a time, and for 64-bit and larger integers, we try to parse 8 digits at a time. Attempting both checks leads to significant performance penalties for simple strings, so only 1 optimization is used at at a time.

In addition, a compact, fallback algorithm uses a naive, simple algorithm, parsing only a single digit at a time. This avoid any unnecessary branching and produces smaller binaries, but comes at a significant performance penalty for integers with more digits.

§Design

Modules§

format
The creation and processing of number format packed structs.
options
Configuration options for parsing integers.

Structs§

NumberFormat
Helper to access features from the packed format struct.
NumberFormatBuilder
Validating builder for NumberFormat from the provided specifications.
Options
Options to customize the parsing integers.
OptionsBuilder
Builder for Options.

Enums§

Error
Error code during parsing, indicating failure type.

Traits§

FromLexical
Trait for numerical types that can be parsed from bytes.
FromLexicalWithOptions
Trait for numerical types that can be parsed from bytes with custom options.
ParseOptions
Shared trait for all parser options.

Type Aliases§

Result
A specialized Result type for lexical operations.