lexical_write_integer/
lib.rs

1//! Fast lexical integer-to-string conversion routines.
2//!
3//! This contains high-performance methods to write integers
4//! directly to bytes, can be converted to [`str`] using
5//! [`str::from_utf8`]. Using [`to_lexical`] is analogous to [`to_string`],
6//! just writing to an existing buffer.
7//!
8//! [`str::from_utf8`]: core::str::from_utf8
9//! [`to_lexical`]: ToLexical::to_lexical
10//!
11//! # Getting Started
12//!
13//! To write a number to bytes, use [`to_lexical`]:
14//!
15//! ```rust
16//! # #[no_std]
17//! # use core::str;
18//! use lexical_write_integer::{FormattedSize, ToLexical};
19//!
20//! let mut buffer = [0u8; u64::FORMATTED_SIZE_DECIMAL];
21//! let digits = 1234u64.to_lexical(&mut buffer);
22//! assert_eq!(str::from_utf8(digits), Ok("1234"));
23//! ```
24//!
25//! Using [`FormattedSize::FORMATTED_SIZE_DECIMAL`] guarantees the buffer
26//! will be large enough to write the digits for all numbers of that
27//! type.
28//!
29//! # Features
30//!
31//! * `format` - Add support for custom integer formatting (currently
32//!   unsupported).
33//! * `power-of-two` - Add support for writing power-of-two integer strings.
34//! * `radix` - Add support for strings of any radix.
35//! * `compact` - Reduce code size at the cost of performance.
36//! * `std` (Default) - Disable to allow use in a [`no_std`] environment.
37//!
38//! [`no_std`]: https://docs.rust-embedded.org/book/intro/no-std.html
39//!
40//! A complete description of supported features includes:
41//!
42//! #### format
43//!
44//! Add support for custom integer formatting. Currently no custom styles are
45//! supported but this could include digit [`separator`] support in the future.
46//!
47//! [`separator`]: NumberFormatBuilder::digit_separator
48//!
49//! <!--
50//! For a list of all supported fields, see [Write Integer
51//! Fields][NumberFormatBuilder#write-integer-fields].
52//! -->
53//!
54//! #### power-of-two
55//!
56//! Enable writing numbers using radixes that are powers of two, that is, `2`,
57//! `4`, `8`, `16`, and `32`. In these cases, you should use [`FORMATTED_SIZE`]
58//! to create a sufficiently large buffer.
59//!
60//! [`FORMATTED_SIZE`]: FormattedSize::FORMATTED_SIZE
61//!
62//! ```rust
63//! # #[cfg(feature = "power-of-two")] {
64//! # use core::str;
65//! use lexical_write_integer::{FormattedSize, NumberFormatBuilder, Options, ToLexicalWithOptions};
66//!
67//! let mut buffer = [0u8; u64::FORMATTED_SIZE];
68//! const BINARY: u128 = NumberFormatBuilder::binary();
69//! const OPTIONS: Options = Options::new();
70//! let digits = 1234u64.to_lexical_with_options::<BINARY>(&mut buffer, &OPTIONS);
71//! assert_eq!(str::from_utf8(digits), Ok("10011010010"));
72//! # }
73//! ```
74//!
75//! #### radix
76//!
77//! Enable writing numbers using all radixes from `2` to `36`. This requires
78//! more static storage than [`power-of-two`][crate#power-of-two], and increases
79//! compile times, but can be quite useful for esoteric programming languages
80//! which use duodecimal integers, for example.
81//!
82//! ```rust
83//! # #[cfg(feature = "radix")] {
84//! # use core::str;
85//! use lexical_write_integer::{FormattedSize, NumberFormatBuilder, Options, ToLexicalWithOptions};
86//!
87//! let mut buffer = [0u8; u64::FORMATTED_SIZE];
88//! const FORMAT: u128 = NumberFormatBuilder::from_radix(12);
89//! const OPTIONS: Options = Options::new();
90//! let digits = 1234u64.to_lexical_with_options::<FORMAT>(&mut buffer, &OPTIONS);
91//! assert_eq!(str::from_utf8(digits), Ok("86A"));
92//! # }
93//! ```
94//!
95//! #### compact
96//!
97//! Reduce the generated code size at the cost of performance. This minimizes
98//! the number of static tables, inlining, and generics used, drastically
99//! reducing the size of the generated binaries.
100//!
101//! #### std
102//!
103//! Enable use of the standard library. Currently, the standard library
104//! is not used, and may be disabled without any change in functionality
105//! on stable.
106//!
107//! # Higher-Level APIs
108//!
109//! If you would like support for writing to [`String`] directly, use
110//! [`lexical`] instead. If you would like an API that supports multiple numeric
111//! conversions rather than just writing integers, use [`lexical-core`] instead.
112//!
113//! [`lexical`]: https://crates.io/crates/lexical
114//! [`lexical-core`]: https://crates.io/crates/lexical-core
115//!
116//! # Version Support
117//!
118//! The minimum, standard, required version is [`1.63.0`][`rust-1.63.0`], for
119//! const generic support. Older versions of lexical support older Rust
120//! versions.
121//!
122//! # Algorithm
123//!
124//! We use 3 algorithms for serializing numbers:
125//! 1. [`Jeaiii Algorithm`] (decimal only)
126//! 2. Power reduction to write 4 digits at a time (radix only)
127//! 3. Compact, single-digit serialization
128//!
129//! ## Decimal
130//!
131//! Our decimal-based digit writers are based on the [`Jeaiii Algorithm`], which
132//! branches based on the number of digits and writes digits to minimize the
133//! number of additions and multiplications. This avoids the need to calculate
134//! the number of digits ahead of time, by just branching on the value.
135//!
136//! James Anhalt's itoa algorithm along with Junekey Jeon's performance tweaks
137//! have excellent performance, however, this can be further optimized. Both
138//! James Anhalt's and Junekey Jeon's use a binary search for determining the
139//! correct number of digits to print (for 32-bit integers).
140//!
141//! ```text
142//!      /\____________
143//!     /  \______     \______
144//!    /\   \     \     \     \
145//!   0  1  /\    /\    /\    /\
146//!        2  3  4  5  6  7  8  9
147//! ```
148//!
149//! This leads to a max tree depth of 4, and the major performance bottleneck
150//! with larger type sizes is the branching. A minor modification can optimize
151//! this, leadingg to a max tree depth of 3 while only required 1 extra
152//! comparison at the top level. Also, we invert the comparisons: oddly enough,
153//! our benchmarks show doing larger comparisons then smaller improves
154//! performance for numbers with both large and small numbers of digits.
155//!
156//! ```text
157//!           ____________________
158//!       ___/_       __|__       \
159//!      /  |  \     /     \      /\
160//!     /\  1   0   /\     /\    8  9
161//!    3  2        6  7   4  5
162//! ```
163//!
164//! For larger integers, we can apply the a similar algorithm with minor
165//! modifications to minimize branching while keeping excellent performance.
166//!
167//! ## Radix
168//!
169//! Our radix-based algorithms work like this, carving off the lower digits and
170//! writing them to the back of the buffer.
171//!
172//! ```rust,ignore
173//! let mut value = 12345u32;
174//! let buffer = [0u8; 32];
175//! let digits = value.digit_count();
176//! let bytes = buffer[..digits];
177//!
178//! let table = ...;  // some pre-computed table of 2 * radix^2 length
179//!
180//! let radix = 10;
181//! let radix2 = radix * radix;
182//! let radix4 = radix2 * radix2
183//! let mut index = bytes.len();
184//! while value >= 10000 {
185//!     let r = value % radix4;
186//!     value /= radix4;
187//!     let r1 = 2 * (r / radix2) as usize;
188//!     let r2 = 2 * (r % radix2) as usize;
189//!
190//!     // write 5, then 4
191//!     index -= 1;
192//!     bytes[index] = table[r2 + 1];
193//!     index -= 1;
194//!     bytes[index] = table[r2];
195//!
196//!     // write 3 then 2
197//!     index -= 1;
198//!     bytes[index] = table[r1 + 1];
199//!     index -= 1;
200//!     bytes[index] = table[r1];
201//! }
202//!
203//! // continue with radix^2 and then a single digit.
204//! ```
205//!
206//! We can efficiently determine at compile time if the pre-computed
207//! tables are large enough so there are no non-local safety considerations
208//! there. The current logic call stack is:
209//! 1. [`to_lexical`]
210//! 2. [`decimal`][`dec`], [`compact`][`cmp`], or [`radix`][`rdx`] (gets the
211//!    correct tables and calls algorithm)
212//! 3. [`jeaiii`]
213//!
214//! # Compact
215//!
216//! A compact, fallback algorithm uses a naive, simple algorithm,
217//! where each loop generates a single digit. This comes at a performance
218//! penalty, but produces smaller binaries. It is analogous to the below
219//! code.
220//!
221//! ```rust,ignore
222//! const fn digit_to_char(digit: u32) -> u8 {
223//!     match r {
224//!        b'0'..=b'9' => c - b'0',
225//!        b'A'..=b'Z' => c - b'A' + 10,
226//!        b'a'..=b'z' => c - b'a' + 10,
227//!        _ => 0xFF,  // unreachable
228//!     }
229//! }
230//!
231//! let mut value = 12345u32;
232//! let buffer = [0u8; 32];
233//! let digits = value.digit_count();
234//! let bytes = buffer[..digits];
235//!
236//! let radix = 10;
237//! let mut index = bytes.len();
238//! while value >= radix {
239//!     let r = value % radix;
240//!     value /= radix;
241//!     index -= 1;
242//!     bytes[index] = digit_to_char(r);
243//! }
244//!
245//! index -= 1;
246//! bytes[index] = digit_to_char(value);
247//! ```
248//!
249//! # Design
250//!
251//! - [Algorithm Approach](https://github.com/Alexhuszagh/rust-lexical/blob/main/lexical-write-integer/docs/Algorithm.md)
252//! - [Benchmarks](https://github.com/Alexhuszagh/rust-lexical/blob/main/lexical-write-integer/docs/Benchmarks.md)
253//! - [Comprehensive Benchmarks](https://github.com/Alexhuszagh/lexical-benchmarks)
254//!
255//! # Safety Guarantees
256//!
257//! <div class="warning info-warning">
258//! <style>
259//! .info-warning::before {
260//!   color: #87CEFAb0 !important;
261//! }
262//! .info-warning {
263//!   border-left: 2px solid #87CEFAb0 !important;
264//! }
265//! </style>
266//!
267//! This module uses some unsafe code to achieve accept acceptable performance.
268//! Providing a buffer of insufficient size will cause the code to panic and
269//! cannot lead to out-of-bounds access. The safety guarantees and logic are
270//! described below.
271//!
272//! </div>
273//!
274//! ### Decimal
275//!
276//! Our decimal writer uses a branched algorithm and therefore the indexing for
277//! each element in the buffer is known ahead of time. The digit
278//! [`generation`][`digit-gen`] is [well-established][`Jeaiii Algorithm`] to
279//! ensure the the lookup value is less than the size of the pre-computed table
280//! (`2 * 10^2`, or 200), and as long as this invariant holds true, then no
281//! undefined behavior can occur.
282//!
283//! [`digit-gen`]: https://github.com/jk-jeon/idiv/blob/main/subproject/example/jeaiii_analysis.cpp
284//!
285//! ### Radix
286//!
287//! The non-decimal writers rely on pre-computed tables and an exact calculation
288//! of the digit count ([`digit_count`]) to avoid any overhead. Avoiding
289//! intermediary copies is **CRITICAL** for fast performance so the entire
290//! buffer must be known but assigned to use algorithms the compiler cannot
291//! easily verify. This is because we use multi-digit optimizations with our
292//! pre-computed tables, so we cannot just iterate over the slice and assign
293//! iteratively. Using checked indexing for the pre-compuited table can lead to
294//! 30%+ decreases in performance. However, with careful analysis and factoring
295//! of the code, it's trivial to demonstrate both the table lookups and buffer
296//! indexing are safe.
297//!
298//! For radixes that are 2^N, we use the `ceil(log(value | 1, radix))` which can
299//! always be calculated through the number of leading [`zeros`][`log2_lz`]. For
300//! other radixes, we calculate the number of digits exactly the same way as if
301//! we were writing digits in an initial pass.
302//!
303//! ### Compact
304//!
305//! The compact decimal writer uses no unsafe indexing.
306//!
307//! [`digit_count`]: https://github.com/Alexhuszagh/rust-lexical/blob/c6c5052/lexical-write-integer/src/digit_count.rs#L180
308//! [`to_lexical`]: crate::ToLexical::to_lexical
309//! [`dec`]: https://github.com/Alexhuszagh/rust-lexical/blob/c6c5052/lexical-write-integer/src/decimal.rs#L278
310//! [`jeaiii`]: https://github.com/Alexhuszagh/rust-lexical/blob/c6c5052/lexical-write-integer/src/jeaiii.rs
311//! [`cmp`]: https://github.com/Alexhuszagh/rust-lexical/blob/c6c5052/lexical-write-integer/src/compact.rs
312//! [`rdx`]: https://github.com/Alexhuszagh/rust-lexical/blob/c6c5052/lexical-write-integer/src/radix.rs
313//! [`Jeaiii Algorithm`]: https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/
314//! [`rust-1.63.0`]: https://blog.rust-lang.org/2022/08/11/Rust-1.63.0.html
315//! [`String`]: https://doc.rust-lang.org/alloc/string/struct.String.html
316//! [`to_string`]: https://doc.rust-lang.org/alloc/string/trait.ToString.html#tymethod.to_string
317//! [`log2_lz`]: https://github.com/Alexhuszagh/rust-lexical/blob/c6c5052/lexical-write-integer/src/digit_count.rs#L119
318
319// We want to have the same safety guarantees as Rust core,
320// so we allow unused unsafe to clearly document safety guarantees.
321#![allow(unused_unsafe)]
322#![cfg_attr(feature = "lint", warn(unsafe_op_in_unsafe_fn))]
323#![cfg_attr(not(feature = "std"), no_std)]
324#![cfg_attr(docsrs, feature(doc_cfg))]
325#![cfg_attr(docsrs, feature(doc_auto_cfg))]
326#![deny(
327    clippy::doc_markdown,
328    clippy::unnecessary_safety_comment,
329    clippy::semicolon_if_nothing_returned,
330    clippy::unwrap_used,
331    clippy::as_underscore
332)]
333#![allow(
334    // used when concepts are logically separate
335    clippy::match_same_arms,
336    // loss of precision is intentional
337    clippy::integer_division,
338    // mathematical names use 1-character identifiers
339    clippy::min_ident_chars,
340    // these are not cryptographically secure contexts
341    clippy::integer_division_remainder_used,
342    // this can be intentional
343    clippy::module_name_repetitions,
344    // this is intentional: already passing a pointer and need performance
345    clippy::needless_pass_by_value,
346    // we use this for inline formatting for unsafe blocks
347    clippy::semicolon_inside_block,
348)]
349
350pub mod algorithm;
351pub mod compact;
352pub mod decimal;
353pub mod digit_count;
354pub mod jeaiii;
355pub mod options;
356pub mod radix;
357pub mod table;
358pub mod write;
359
360mod api;
361mod table_binary;
362mod table_decimal;
363mod table_radix;
364
365// Re-exports
366pub use lexical_util::constants::{FormattedSize, BUFFER_SIZE};
367pub use lexical_util::error::Error;
368pub use lexical_util::format::{self, NumberFormat, NumberFormatBuilder};
369pub use lexical_util::options::WriteOptions;
370pub use lexical_util::result::Result;
371
372pub use self::api::{ToLexical, ToLexicalWithOptions};
373#[doc(inline)]
374pub use self::options::{Options, OptionsBuilder};