Arabic Text Utils
A comprehensive Rust library for Arabic text processing and manipulation. This crate provides a collection of utilities for working with Arabic text, including character analysis, number conversion, text normalization, and more.
🇵🇸 Free Palestine
We stand in solidarity with the Palestinian people. To learn more about supporting Palestine, please visit BDS Movement.
Features
-
Character Analysis
- Identify Arabic characters
- Get Arabic character names
- Check for diacritical marks (harakat)
-
Number Handling
- Convert between Arabic and Western numerals
- Support for Eastern Arabic numerals (٠١٢٣٤٥٦٧٨٩)
-
Text Processing
- Remove diacritical marks (tashkeel)
- Normalize Arabic text
- Detect Arabic content
- Count Arabic words
- Extract Arabic text from mixed content
- RTL character detection
- Character frequency analysis
- Word and sentence segmentation
- URL-friendly slug generation
- Text wrapping
- Text sanitization
-
Presentation Forms
- Normalize Arabic presentation forms
- Handle Arabic ligatures
- Convert between isolated and connected forms
-
Punctuation
- Convert between Arabic and Latin punctuation marks
Installation
Add this to your Cargo.toml:
[]
= "0.1.0"
Usage
use ;
// Remove diacritical marks
let text = "مَرْحَباً بِكُمْ";
assert_eq!;
// Normalize Arabic text
let text = "ﷺ"; // Arabic ligature
let normalized = normalize_arabic;
assert_eq!;
// Convert numbers to Arabic
let text = "Page 123";
assert_eq!;
// Check if a character is Arabic
assert!;
assert!;
Documentation
For detailed documentation and examples, please visit docs.rs/arabic_text_utils.
Features in Detail
Character Module
is_arabic_char: Detect Arabic charactersget_arabic_char_name: Get Unicode names for Arabic charactersis_haraka: Check for diacritical marks
Numbers Module
convert_numbers_to_arabic: Convert Western numerals to Arabicconvert_numbers_from_arabic: Convert Arabic numerals to Western
Text Module
remove_tashkeel: Strip diacritical marksnormalize_arabic: Standardize Arabic text representationcontains_arabic: Check for Arabic contentcount_arabic_words: Count Arabic words in textextract_arabic_text: Extract Arabic-only contenthas_rtl_characters: Detect right-to-left charactersarabic_character_frequency: Analyze character distributionsegment_words: Split text into wordssegment_sentences: Split text into sentencesgenerate_slug: Create URL-friendly textwrap_text: Wrap text at specified widthsanitize_arabic: Clean and standardize Arabic text
Presentation Module
normalize_presentation_forms: Standardize Arabic character formsreplace_ligatures: Handle special character combinations
Punctuation Module
convert_punctuation: Convert between Arabic and Latin punctuation
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Unicode Standard for Arabic script processing
- The Rust community for their valuable feedback and contributions