dec_from_char 0.2.0

Small library for converting unicode decimal into numbers
Documentation
# Extended Decimal


[![Crates.io](https://img.shields.io/crates/v/dec_from_char.svg)](https://crates.io/crates/dec_from_char)
[![Docs.rs](https://docs.rs/dec_from_char/badge.svg)](https://docs.rs/dec_from_char)
[![License: MIT](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue.svg)](https://opensource.org/licenses/MIT)

A tiny, zero-cost Rust library to correctly parse *any* Unicode decimal digit.

Ever needed to parse a number from a string, but it might contain digits from other languages like `९` (Devanagari nine) or `٣` (Arabic-Indic three)? The standard `char::to_digit` in Rust only handles ASCII digits well. This crate extends that power to all Unicode characters in the "Decimal Number (`Nd`)" category.

## Features


*   **Blazing Fast:** All Unicode mappings are resolved at compile-time into a highly efficient `match` statement. This means converting a character at runtime is a zero-cost abstraction with no overhead.
*   **Simple API:** Provides a straightforward extension trait, `DecimalExtended`, for the `char` type. If you know how to use Rust, you already know how to use this.
*   **Self-Contained:** The necessary Unicode data is bundled into the crate, so you don't need to worry about external files or runtime downloads.
*   **Comprehensive:** Correctly identifies and converts all decimal digits across various scripts as defined by the Unicode Standard.

## Quick Start


1.  Add `dec_from_char` to your `Cargo.toml`:

    ```toml
    [dependencies]
    dec_from_char = "0.2.0" # Replace with the latest version
    ```

2.  Use the `DecimalExtended` trait to convert characters.

    ```rust
    use dec_from_char::DecimalExtended;

    fn main() {
        // Works for common ASCII digits
        assert_eq!('7'.to_decimal_utf8(), Some(7));

        // And for a wide range of other Unicode digits!
        assert_eq!(''.to_decimal_utf8(), Some(9)); // Devanagari
        assert_eq!(''.to_decimal_utf8(), Some(0)); // Devanagari
        assert_eq!(''.to_decimal_utf8(), Some(7)); // Fullwidth
        assert_eq!('٣'.to_decimal_utf8(), Some(3)); // Extended Arabic-Indic

        // It gracefully returns None for non-digit characters
        assert_eq!('a'.to_decimal_utf8(), None);
        assert_eq!('🎉'.to_decimal_utf8(), None);

        // Normalization
        assert_eq!('٣'.normalize_decimal(), Some('3'));
        assert_eq!(''.normalize_decimal(), Some('7'));
        assert_eq!('🎉'.normalize_decimal(), None);
    }
    ```

## Example: Parsing Numbers from a Mixed-Script String


This crate makes it trivial to extract numbers from text, no matter how they are formatted.

```rust
use dec_from_char::DecimalExtended;

let messy_string = "Phone number: (0)𝟗𝟖-𝟳𝟲𝟱 and pin: ٣-١-٤-١";

let digits: String = messy_string.chars()
    .filter_map(|c| c.normalize_decimal()) // Convert each char to a digit if possible
    .collect();

assert_eq!(digits, "0987653141");

// you can do the same with `normalize_decimals_filtering`
assert_eq!(normalize_decimals_filtering(messy_string) "0987653141");
// or you can normalize digits keeping rest chars
assert_eq!(normalize_decimals(messy_string), "Phone number: (0)98-765 and pin: 3-1-4-1");
println!("Extracted digits: {}", digits); // "0987653141"
```

## How It Works


This crate contains two main parts:

1.  A procedural macro that reads the official `UnicodeData.txt` file at **compile time**.
2.  An extension trait that uses the code generated by this macro.

When you compile your project, the macro scans the Unicode data file for every character that is a decimal digit (category `Nd`). It then generates a massive, but hyper-efficient, `match` statement that maps each of these characters to its `u8` value (0-9).

This generated code is then compiled directly into your binary. The result? At runtime, calling `.to_decimal_utf8()` is as fast as it gets, with no searching, parsing, or hashmaps involved.

## API


The crate exposes a single trait:

`pub trait DecimalExtended`

*   `fn to_decimal_utf8(&self) -> Option<u8>`: Converts any decimal Unicode digit in the `Nd` category to a `u8`. Returns `None` if the character is not a decimal digit.
*   `fn is_decimal_utf8(&self) -> bool`: A convenience method that returns `true` if the character is a decimal digit.

## License


This project is licensed under either of
*   Apache License, Version 2.0, ([LICENSE-APACHE]http://www.apache.org/licenses/LICENSE-2.0)
*   MIT license ([LICENSE-MIT]http://opensource.org/licenses/MIT)

at your option.

## Contributing


Contributions, issues, and feature requests are welcome! Feel free to check the [issues page](https://github.com/your-username/extended-decimal/issues).