Extended Decimal
A tiny, zero-cost Rust library to correctly parse any Unicode decimal digit.
Ever needed to parse a number from a string, but it might contain digits from other languages like ९ (Devanagari nine) or ٣ (Arabic-Indic three)? The standard char::to_digit in Rust only handles ASCII digits well. This crate extends that power to all Unicode characters in the "Decimal Number (Nd)" category.
Features
- Blazing Fast: All Unicode mappings are resolved at compile-time into a highly efficient
matchstatement. This means converting a character at runtime is a zero-cost abstraction with no overhead. - Simple API: Provides a straightforward extension trait,
DecimalExtended, for thechartype. If you know how to use Rust, you already know how to use this. - Self-Contained: The necessary Unicode data is bundled into the crate, so you don't need to worry about external files or runtime downloads.
- Comprehensive: Correctly identifies and converts all decimal digits across various scripts as defined by the Unicode Standard.
Quick Start
-
Add
dec_from_charto yourCargo.toml:[] = "0.2.0" # Replace with the latest version -
Use the
DecimalExtendedtrait to convert characters.use DecimalExtended;
Example: Parsing Numbers from a Mixed-Script String
This crate makes it trivial to extract numbers from text, no matter how they are formatted.
use DecimalExtended;
let messy_string = "Phone number: (0)𝟗𝟖-𝟳𝟲𝟱 and pin: ٣-١-٤-١";
let digits: String = messy_string.chars
.filter_map // Convert each char to a digit if possible
.collect;
assert_eq!;
// you can do the same with `normalize_decimals_filtering`
assert_eq!;
// or you can normalize digits keeping rest chars
assert_eq!;
println!; // "0987653141"
How It Works
This crate contains two main parts:
- A procedural macro that reads the official
UnicodeData.txtfile at compile time. - An extension trait that uses the code generated by this macro.
When you compile your project, the macro scans the Unicode data file for every character that is a decimal digit (category Nd). It then generates a massive, but hyper-efficient, match statement that maps each of these characters to its u8 value (0-9).
This generated code is then compiled directly into your binary. The result? At runtime, calling .to_decimal_utf8() is as fast as it gets, with no searching, parsing, or hashmaps involved.
API
The crate exposes a single trait:
pub trait DecimalExtended
fn to_decimal_utf8(&self) -> Option<u8>: Converts any decimal Unicode digit in theNdcategory to au8. ReturnsNoneif the character is not a decimal digit.fn is_decimal_utf8(&self) -> bool: A convenience method that returnstrueif the character is a decimal digit.
License
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.
Contributing
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.