Expand description
Fast lexical string-to-float conversion routines.
This contains high-performance methods to parse floats from bytes.
Using from_lexical
is analogous to parse
,
while enabling parsing from bytes as well as str
.
§Getting Started
To parse a number from bytes, use from_lexical
:
use lexical_parse_float::{Error, FromLexical};
let value = f64::from_lexical("1234.5".as_bytes());
assert_eq!(value, Ok(1234.5));
let value = f64::from_lexical("1.2345e325".as_bytes());
assert_eq!(value, Ok(f64::INFINITY));
let value = f64::from_lexical("1234.5 }, {\"Key\", \"Value\"}}".as_bytes());
assert_eq!(value, Err(Error::InvalidDigit(6)));
If wishing to incrementally parse a string from bytes, that is, parse as many characters until an invalid digit is found, you can use the partial parsers. This is useful in parsing data where the type is known, such as JSON, but where the end of the number is not yet known.
use lexical_parse_float::{Error, FromLexical};
let value = f64::from_lexical_partial("1234.5 }, {\"Key\", \"Value\"}}".as_bytes());
assert_eq!(value, Ok((1234.5, 6)));
let value = f64::from_lexical_partial("1.2345e325".as_bytes());
assert_eq!(value, Ok((f64::INFINITY, 10)));
§Options/Formatting API
Each float parser contains extensive formatting control through
options
and format
, including digit separator
support (that is, floats such as 1_2__3.4_5
), if integral,
fractional, or any significant digits are required, if to disable
parsing non-finite values, if +
signs are invalid or required,
and much more. For more comprehensive examples, see the
format
and Comprehensive Configuration sections
below.
use lexical_parse_float::{Error, FromLexicalWithOptions, NumberFormatBuilder, Options};
const FORMAT: u128 = NumberFormatBuilder::new()
// require a `+` or `-` sign before the number
.required_mantissa_sign(true)
// require a `+` or `-` sign before the exponent digits
.required_exponent_sign(true)
// build the format, panicking if the format is invalid
.build_strict();
const OPTIONS: Options = Options::new();
let value = "+1.234e+300";
let result = f64::from_lexical_with_options::<FORMAT>(value.as_bytes(), &OPTIONS);
assert_eq!(result, Ok(1.234e+300));
let value = "1.234e+300";
let result = f64::from_lexical_with_options::<FORMAT>(value.as_bytes(), &OPTIONS);
assert_eq!(result, Err(Error::MissingSign(0)));
§Features
format
- Add support for parsing custom integer formats.power-of-two
- Add support for parsing power-of-two integer strings.radix
- Add support for strings of any radix.compact
- Reduce code size at the cost of performance.f16
- Enable support for half-precisionf16
andbf16
floats.std
(Default) - Disable to allow use in ano_std
environment.
A complete description of supported features includes:
§format
Add support custom float parsing specifications. This should be used in
conjunction with Options
for extensible float parsing.
§JSON
For example, in JSON, the following floats are valid or invalid:
-1 // valid
+1 // invalid
1 // valid
1. // invalid
.1 // invalid
0.1 // valid
nan // invalid
inf // invalid
Infinity // invalid
All of the finite numbers are valid in Rust, and Rust provides constants for non-finite floats. In order to parse standard-conforming JSON floats using lexical, you may use the following approach:
use lexical_parse_float::{format, options, Error, FromLexicalWithOptions, Result};
fn parse_json_float(bytes: &[u8]) -> Result<f64> {
f64::from_lexical_with_options::<{ format::JSON }>(bytes, &options::JSON)
}
assert_eq!(parse_json_float(b"-1"), Ok(-1.0));
assert_eq!(parse_json_float(b"+1"), Err(Error::InvalidPositiveSign(0)));
assert_eq!(parse_json_float(b"1"), Ok(1.0));
assert_eq!(parse_json_float(b"1."), Err(Error::EmptyFraction(2)));
assert_eq!(parse_json_float(b"0.1"), Ok(0.1));
assert_eq!(parse_json_float(b"nan"), Err(Error::EmptyInteger(0)));
assert_eq!(parse_json_float(b"inf"), Err(Error::EmptyInteger(0)));
assert_eq!(parse_json_float(b"Infinity"), Err(Error::EmptyInteger(0)));
§Custom Format
An example building and using a custom format, with many of the available options is:
use lexical_parse_float::{Error, NumberFormatBuilder, Options, FromLexicalWithOptions};
const FORMAT: u128 = NumberFormatBuilder::new()
// enable the use of digit separators with `_`
.digit_separator(num::NonZeroU8::new(b'_'))
// require digits before and after the decimal point,
// if the decimal point is present.
.required_integer_digits(true)
.required_fraction_digits(true)
// do not allow a leading `+` sign, so `+123` is invalid
.no_positive_mantissa_sign(true)
// do not allow `0` before an integer, so `01.1` is invalid.
// however, `0.1` is valid.
.no_integer_leading_zeros(true)
// allow digit separators anywhere, including consecutive ones
.leading_digit_separator(true)
.trailing_digit_separator(true)
.internal_digit_separator(true)
.consecutive_digit_separator(true)
// make it so the exponent character, `e`, is case-sensitive
// that is, `E` is not considered a valid exponent character
.case_sensitive_exponent(true)
.build_strict();
const OPTIONS: Options = Options::builder()
// change the string representation of NaN from `NaN` to `nan`
.nan_string(Some(b"nan"))
// disable a short infinity: long infinity is still allowed
.inf_string(None)
.build_strict();
let value = f64::from_lexical_with_options::<FORMAT>(b"1_2.3_4", &OPTIONS);
assert_eq!(value, Ok(12.34));
let value = f64::from_lexical_with_options::<FORMAT>(b"-inf", &OPTIONS);
assert_eq!(value, Err(Error::EmptyInteger(1)));
let value = f64::from_lexical_with_options::<FORMAT>(b"Infinity", &OPTIONS);
assert_eq!(value, Ok(f64::INFINITY));
let value = f64::from_lexical_with_options::<FORMAT>(b"nan", &OPTIONS);
assert_eq!(value.map(|x| x.is_nan()), Ok(true));
let value = f64::from_lexical_with_options::<FORMAT>(b"+1_2.3_4", &OPTIONS);
assert_eq!(value, Err(Error::InvalidPositiveSign(0)));
let value = f64::from_lexical_with_options::<FORMAT>(b"0.3_4", &OPTIONS);
assert_eq!(value, Ok(0.34));
let value = f64::from_lexical_with_options::<FORMAT>(b"12", &OPTIONS);
assert_eq!(value, Ok(12.0));
let value = f64::from_lexical_with_options::<FORMAT>(b"12.", &OPTIONS);
assert_eq!(value, Err(Error::EmptyFraction(3)));
let value = f64::from_lexical_with_options::<FORMAT>(b"1.234e5", &OPTIONS);
assert_eq!(value, Ok(1.234e5));
let value = f64::from_lexical_with_options::<FORMAT>(b"1.234E5", &OPTIONS);
assert_eq!(value, Err(Error::InvalidDigit(5)));
Enabling the format
API significantly increases compile
times, however, it enables a large amount of customization in how floats are
written.
§power-of-two
Enable parsing numbers with radixes that are powers of two, that is, 2
,
4
, 8
, 16
, and 32
.
use lexical_parse_float::{NumberFormatBuilder, Options, FromLexicalWithOptions};
const BINARY: u128 = NumberFormatBuilder::binary();
const OPTIONS: Options = Options::new();
let value = "1.0011101111100111011011001000101101000011100101011";
let result = f64::from_lexical_with_options::<BINARY>(value.as_bytes(), &OPTIONS);
assert_eq!(result, Ok(1.234f64));
§radix
Enable parsing numbers using all radixes from 2
to 36
. This requires
more static storage than power-of-two
, and increases
compile times, but can be quite useful for esoteric programming languages
which use duodecimal floats, for example.
use lexical_parse_float::{NumberFormatBuilder, Options, FromLexicalWithOptions};
const FORMAT: u128 = NumberFormatBuilder::from_radix(12);
const OPTIONS: Options = Options::new();
let value = "1.29842830A44BAA2";
let result = f64::from_lexical_with_options::<FORMAT>(value.as_bytes(), &OPTIONS);
assert_eq!(result, Ok(1.234f64));
§compact
Reduce the generated code size at the cost of performance. This minimizes the number of static tables, inlining, and generics used, drastically reducing the size of the generated binaries. However, this resulting performance of the generated code is much lower.
§f16
This enables the use of the half-precision floats f16
and
bf16
. However, since these have limited hardware support
and are primarily used for vectorized operations, they are parsed as if
they were an f32
. Due to the low precision of 16-bit floats, the results
may appear to have significant rounding error.
use lexical_parse_float::{f16, FromLexical};
let value = "1.234375";
let result = f16::from_lexical(value.as_bytes());
assert_eq!(result, Ok(f16::from_f64_const(1.234f64)));
§std
Enable use of the standard library. Currently, the standard library is not used, and may be disabled without any change in functionality on stable.
§Comprehensive Configuration
lexical-parse-float
provides two main levels of configuration:
- The
NumberFormatBuilder
, creating a packed struct with custom formatting options. - The
Options
API.
§Number Format
The number format class provides numerous flags to specify number writing.
When the power-of-two
feature is enabled, additional
flags are added:
- The radix for the significant digits (default
10
). - The radix for the exponent base (default
10
). - The radix for the exponent digits (default
10
).
When the format
feature is enabled, numerous other syntax and
digit separator flags are enabled, including:
- A digit separator character, to group digits for increased legibility.
- Whether leading, trailing, internal, and consecutive digit separators are allowed.
- Toggling required float components, such as digits before the decimal point.
- Toggling whether special floats are allowed or are case-sensitive.
Many pre-defined constants therefore exist to simplify common use-cases, including:
JSON
,XML
,TOML
,YAML
,SQLite
, and many more.Rust
,Python
,C#
,FORTRAN
,COBOL
literals and strings, and many more.
For a list of all supported fields, see Parse Float Fields.
§Options API
The Options API provides high-level options to specify number parsing or writing, options not intrinsically tied to a number format. For example, the Options API provides:
- The
exponent
character (defaults tob'e'
orb'^'
, depending on the radix). - The
decimal point
character (defaults tob'.'
). - Custom
NaN
andInfinity
stringrepresentations
.
In addition, pre-defined constants for each category of options may
be found in their respective modules, for example, JSON
.
§Examples
An example of creating your own options to parse European-style numbers (which use commas as decimal points, and periods as digit separators) is as follows:
use lexical_parse_float::{format, FromLexicalWithOptions, NumberFormatBuilder, Options};
// This creates a format to parse a European-style float number.
// The decimal point is a comma, and the digit separators (optional)
// are periods.
const EUROPEAN: u128 = NumberFormatBuilder::new()
.digit_separator(num::NonZeroU8::new(b'.'))
.build_strict();
const COMMA_OPTIONS: Options = Options::builder()
.decimal_point(b',')
.build_strict();
assert_eq!(
f32::from_lexical_with_options::<EUROPEAN>(b"300,10", &COMMA_OPTIONS),
Ok(300.10)
);
// Another example, using a pre-defined constant for JSON.
const JSON: u128 = format::JSON;
const JSON_OPTIONS: Options = Options::new();
assert_eq!(
f32::from_lexical_with_options::<JSON>(b"0e1", &JSON_OPTIONS),
Ok(0.0)
);
assert_eq!(
f32::from_lexical_with_options::<JSON>(b"1E+2", &JSON_OPTIONS),
Ok(100.0)
);
§Higher-Level APIs
If you would like an API that supports multiple numeric conversions rather
than just writing integers, use lexical
or lexical-core
instead.
§Version Support
The minimum, standard, required version is 1.63.0
, for
const generic support. Older versions of lexical support older Rust
versions.
§Algorithm
The default implementations are highly optimized both for simple strings, as well as input with large numbers of digits. In order to keep performance optimal for simple strings, we avoid overly branching to minimize the number of branches (and therefore optimization checks). Most of the branches in the code are resolved at compile-time, and the resulting ASM as well as comprehensive benchmarks are monitored to ensure there are no regressions.
For simple floats, we use an optimized digit parser with multiple-digit optimizations (parsing 8 digits in 3 multiplication instructions), and then use machine floats to create an exact representation with high throughput. In more complex cases, we use the Eisel-Lemire algorithm, described in “Number Parsing at a Gigabyte per Second”, available online here. The Eisel-Lemire algorithm creates an extended representation using a 128-bit (or a fallback 192-bit representation) of the significant digits of the float, scaled to the proper exponent using pre-computed powers-of-5.
If the Eisel-Lemire algorithm is unable to unambiguously round the float, we fallback to using optimized, big-integer algorithms, which are described in Algorithm Approach below.
§Machine Float-Only Algorithm
We also support an algorithm that uses only machine floats for the fast-path algorithm, however, this may be slower for floats with large exponents since it uses an iterative algorithm. A code sample using this is:
use lexical_parse_float::Options;
use lexical_parse_float::format::STANDARD;
use lexical_parse_float::parse::ParseFloat;
const OPTIONS: Options = Options::new();
let result = f64::fast_path_complete::<{ STANDARD }>(b"1.34000", &OPTIONS);
assert_eq!(result, Ok(1.34000));
§Design
§Safety Guarantees
This module uses some unsafe code to achieve accept acceptable performance. The safety guarantees and logic are described below.
The primary use of unsafe code is in the big integer implementation, which for performance reasons requires unchecked indexing at certain points, where rust cannot elide the index check. The use of unsafe code can be found in the calculation of the hi bits, however, every invocation requires the buffer to be of sufficient length. The other major source is the implementation of methods such as push_unchecked, however, the safety invariants for each caller to create a safe API are documented and has similar safety guarantees to a regular vector. All other invocations of unsafe code are indexing a buffer where the index is proven to be within bounds within a few lines of code of the unsafe index.
Modules§
- format
- The creation and processing of number format packed structs.
- options
- Configuration options for parsing floats.
Structs§
- Number
Format - Helper to access features from the packed format struct.
- Number
Format Builder - Validating builder for
NumberFormat
from the provided specifications. - Options
- Options to customize parsing floats.
- Options
Builder - Builder for
Options
. - bf16
f16
- A 16-bit floating point type implementing the
bfloat16
format. - f16
f16
- A 16-bit floating point type implementing the IEEE 754-2008 standard
binary16
a.k.a “half” format.
Enums§
- Error
- Error code during parsing, indicating failure type.
Traits§
- From
Lexical - Trait for numerical types that can be parsed from bytes.
- From
Lexical With Options - Trait for numerical types that can be parsed from bytes with custom options.
- Parse
Options - Shared trait for all parser options.