# Arabic Text Utils
A comprehensive Rust library for Arabic text processing and manipulation. This crate provides a collection of utilities for working with Arabic text, including character analysis, number conversion, text normalization, and more.
## 🇵🇸 Free Palestine
We stand in solidarity with the Palestinian people. To learn more about supporting Palestine, please visit [BDS Movement](https://bdsmovement.net/).
## Features
- **Character Analysis**
- Identify Arabic characters
- Get Arabic character names
- Check for diacritical marks (harakat)
- **Number Handling**
- Convert between Arabic and Western numerals
- Support for Eastern Arabic numerals (٠١٢٣٤٥٦٧٨٩)
- **Text Processing**
- Remove diacritical marks (tashkeel)
- Normalize Arabic text
- Detect Arabic content
- Count Arabic words
- Extract Arabic text from mixed content
- RTL character detection
- Character frequency analysis
- Word and sentence segmentation
- URL-friendly slug generation
- Text wrapping
- Text sanitization
- **Presentation Forms**
- Normalize Arabic presentation forms
- Handle Arabic ligatures
- Convert between isolated and connected forms
- **Punctuation**
- Convert between Arabic and Latin punctuation marks
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
arabic_text_utils = "0.1.0"
```
## Usage
```rust
use arabic_text_utils::{
remove_tashkeel,
normalize_arabic,
convert_numbers_to_arabic,
is_arabic_char,
};
// Remove diacritical marks
let text = "مَرْحَباً بِكُمْ";
assert_eq!(remove_tashkeel(text), "مرحبا بكم");
// Normalize Arabic text
let text = "ﷺ"; // Arabic ligature
let normalized = normalize_arabic(text);
assert_eq!(normalized, "صلى الله عليه وسلم");
// Convert numbers to Arabic
let text = "Page 123";
assert_eq!(convert_numbers_to_arabic(text), "Page ١٢٣");
// Check if a character is Arabic
assert!(is_arabic_char('ع'));
assert!(!is_arabic_char('x'));
```
## Documentation
For detailed documentation and examples, please visit [docs.rs/arabic_text_utils](https://docs.rs/arabic_text_utils).
## Features in Detail
### Character Module
- `is_arabic_char`: Detect Arabic characters
- `get_arabic_char_name`: Get Unicode names for Arabic characters
- `is_haraka`: Check for diacritical marks
### Numbers Module
- `convert_numbers_to_arabic`: Convert Western numerals to Arabic
- `convert_numbers_from_arabic`: Convert Arabic numerals to Western
### Text Module
- `remove_tashkeel`: Strip diacritical marks
- `normalize_arabic`: Standardize Arabic text representation
- `contains_arabic`: Check for Arabic content
- `count_arabic_words`: Count Arabic words in text
- `extract_arabic_text`: Extract Arabic-only content
- `has_rtl_characters`: Detect right-to-left characters
- `arabic_character_frequency`: Analyze character distribution
- `segment_words`: Split text into words
- `segment_sentences`: Split text into sentences
- `generate_slug`: Create URL-friendly text
- `wrap_text`: Wrap text at specified width
- `sanitize_arabic`: Clean and standardize Arabic text
### Presentation Module
- `normalize_presentation_forms`: Standardize Arabic character forms
- `replace_ligatures`: Handle special character combinations
### Punctuation Module
- `convert_punctuation`: Convert between Arabic and Latin punctuation
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
- Unicode Standard for Arabic script processing
- The Rust community for their valuable feedback and contributions