Expand description
Fast lexical string-to-integer conversion routines.
This contains high-performance methods to parse integers from bytes.
Using from_lexical
is analogous to parse
,
while enabling parsing from bytes as well as str
.
§Getting Started
To parse a number from bytes, use from_lexical
:
use lexical_parse_integer::{Error, FromLexical};
let value = u64::from_lexical("1234".as_bytes());
assert_eq!(value, Ok(1234));
let value = u64::from_lexical("18446744073709551616".as_bytes());
assert_eq!(value, Err(Error::Overflow(19)));
let value = u64::from_lexical("1234 }, {\"Key\", \"Value\"}}".as_bytes());
assert_eq!(value, Err(Error::InvalidDigit(4)));
If wishing to incrementally parse a string from bytes, that is, parse as many characters until an invalid digit is found, you can use the partial parsers. This is useful in parsing data where the type is known, such as JSON, but where the end of the number is not yet known.
use lexical_parse_integer::{Error, FromLexical};
let value = u64::from_lexical_partial("1234 }, {\"Key\", \"Value\"}}".as_bytes());
assert_eq!(value, Ok((1234, 4)));
let value = u64::from_lexical_partial("18446744073709551616 }, {\"Key\", \"Value\"}}".as_bytes());
assert_eq!(value, Err(Error::Overflow(19)));
§Options/Formatting API
Each integer parser contains extensive formatting control through
format
, particularly digit separator
support (that is,
integers such as 1_2__3
). For options, we have custom formats
optimized for both small
and large
integers.
To optimize for smaller integers at the expense of performance of larger
ones, you can use OptionsBuilder::no_multi_digit
(defaults to true
).
use lexical_parse_integer::{options, NumberFormatBuilder, FromLexicalWithOptions};
const FORMAT: u128 = NumberFormatBuilder::new().build_strict();
// a bit faster
let value = u64::from_lexical_with_options::<FORMAT>(b"12", &options::SMALL_NUMBERS);
assert_eq!(value, Ok(12));
// a lot slower
let value = u64::from_lexical_with_options::<FORMAT>(b"18446744073709551615", &options::SMALL_NUMBERS);
assert_eq!(value, Ok(0xffffffffffffffff));
§Features
format
- Add support for parsing custom integer formats.power-of-two
- Add support for parsing power-of-two integer strings.radix
- Add support for strings of any radix.compact
- Reduce code size at the cost of performance.std
(Default) - Disable to allow use in ano_std
environment.
A complete description of supported features includes:
§format
Add support custom float formatting specifications. This should be used in
conjunction with Options
for extensible integer parsing. This allows
changing the use of digit separators, requiring or not allowing signs, and
more.
§JSON
For example, in JSON, the following integers are valid or invalid:
-1 // valid
+1 // invalid
1 // valid
All of these are valid in our default format (the format of Rust strings), so we must use a custom format to parse JSON strings:
use lexical_parse_integer::{format, Error, FromLexicalWithOptions, Options};
const OPTIONS: Options = Options::new();
let value = u64::from_lexical_with_options::<{ format::JSON }>("1234".as_bytes(), &OPTIONS);
assert_eq!(value, Ok(1234));
let value = u64::from_lexical_with_options::<{ format::JSON }>("+1234".as_bytes(), &OPTIONS);
assert_eq!(value, Err(Error::InvalidPositiveSign(0)));
§Custom Format
An example of building a custom format to with digit separator support is:
use lexical_parse_integer::{NumberFormatBuilder, Options, FromLexicalWithOptions};
const FORMAT: u128 = NumberFormatBuilder::new()
// require that a `+` or `-` preceeds the number
.required_mantissa_sign(true)
// allow internal digit separators, that is, a special character between digits
.integer_internal_digit_separator(true)
// use `_` as the digit separator
.digit_separator(num::NonZeroU8::new(b'_'))
// allow an optional `0d` prefix to the number
.base_prefix(num::NonZeroU8::new(b'd'))
// build the number format, panicking on error
.build_strict();
const OPTIONS: Options = Options::new();
let value = u64::from_lexical_with_options::<FORMAT>("+12_3_4".as_bytes(), &OPTIONS);
assert_eq!(value, Ok(1234));
let value = u64::from_lexical_with_options::<FORMAT>("+0d12_3_4".as_bytes(), &OPTIONS);
assert_eq!(value, Ok(1234));
For a list of all supported fields, see Parse Integer Fields.
Enabling the format
API significantly increases compile
times, however, it enables a large amount of customization in how integers
are parsed.
§power-of-two
Enable parsing numbers that are powers of two, that is, 2
, 4
, 8
, 16
,
and 32
.
use lexical_parse_integer::{FromLexicalWithOptions, NumberFormatBuilder, Options};
const BINARY: u128 = NumberFormatBuilder::binary();
const OPTIONS: Options = Options::new();
let value = u64::from_lexical_with_options::<BINARY>("10011010010".as_bytes(), &OPTIONS);
assert_eq!(value, Ok(1234));
§radix
Enable parsing numbers using all radixes from 2
to 36
. This requires
more static storage than power-of-two
, and increases
compile times, but can be quite useful for esoteric programming languages
which use duodecimal integers.
use lexical_parse_integer::{FromLexicalWithOptions, NumberFormatBuilder, Options};
const BINARY: u128 = NumberFormatBuilder::from_radix(12);
const OPTIONS: Options = Options::new();
let value = u64::from_lexical_with_options::<BINARY>("86A".as_bytes(), &OPTIONS);
assert_eq!(value, Ok(1234));
§compact
Reduce the generated code size at the cost of performance. This minimizes the number of static tables, inlining, and generics used, drastically reducing the size of the generated binaries. However, this resulting performance of the generated code is much lower.
§std
Enable use of the standard library. Currently, the standard library is not used, and may be disabled without any change in functionality on stable.
§Higher-Level APIs
If you would like an API that supports multiple numeric conversions rather
than just writing integers, use lexical
or lexical-core
instead.
§Version Support
The minimum, standard, required version is 1.63.0
, for
const generic support. Older versions of lexical support older Rust
versions.
§Algorithm
The default implementations are highly optimized both for simple strings, as well as input with large numbers of digits. In order to keep performance optimal for simple strings, we avoid overly branching to minimize the number of branches (and therefore optimization checks). Most of the branches in the code are resolved at compile-time, and the resulting ASM is monitored to ensure there are no regressions. For larger strings, a limited number of optimization checks are included to try faster, multi-digit parsing algorithms. For 32-bit integers, we try to parse 4 digits at a time, and for 64-bit and larger integers, we try to parse 8 digits at a time. Attempting both checks leads to significant performance penalties for simple strings, so only 1 optimization is used at at a time.
In addition, a compact, fallback algorithm uses a naive, simple algorithm, parsing only a single digit at a time. This avoid any unnecessary branching and produces smaller binaries, but comes at a significant performance penalty for integers with more digits.
§Design
Modules§
- format
- The creation and processing of number format packed structs.
- options
- Configuration options for parsing integers.
Structs§
- Number
Format - Helper to access features from the packed format struct.
- Number
Format Builder - Validating builder for
NumberFormat
from the provided specifications. - Options
- Options to customize the parsing integers.
- Options
Builder - Builder for
Options
.
Enums§
- Error
- Error code during parsing, indicating failure type.
Traits§
- From
Lexical - Trait for numerical types that can be parsed from bytes.
- From
Lexical With Options - Trait for numerical types that can be parsed from bytes with custom options.
- Parse
Options - Shared trait for all parser options.