arabic_text_utils 0.1.0

A Rust library for Arabic text processing and manipulation
Documentation
# Arabic Text Utils

A comprehensive Rust library for Arabic text processing and manipulation. This crate provides a collection of utilities for working with Arabic text, including character analysis, number conversion, text normalization, and more.

## 🇵🇸 Free Palestine
We stand in solidarity with the Palestinian people. To learn more about supporting Palestine, please visit [BDS Movement](https://bdsmovement.net/).

## Features

- **Character Analysis**
  - Identify Arabic characters
  - Get Arabic character names
  - Check for diacritical marks (harakat)

- **Number Handling**
  - Convert between Arabic and Western numerals
  - Support for Eastern Arabic numerals (٠١٢٣٤٥٦٧٨٩)

- **Text Processing**
  - Remove diacritical marks (tashkeel)
  - Normalize Arabic text
  - Detect Arabic content
  - Count Arabic words
  - Extract Arabic text from mixed content
  - RTL character detection
  - Character frequency analysis
  - Word and sentence segmentation
  - URL-friendly slug generation
  - Text wrapping
  - Text sanitization

- **Presentation Forms**
  - Normalize Arabic presentation forms
  - Handle Arabic ligatures
  - Convert between isolated and connected forms

- **Punctuation**
  - Convert between Arabic and Latin punctuation marks

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
arabic_text_utils = "0.1.0"
```

## Usage

```rust
use arabic_text_utils::{
    remove_tashkeel,
    normalize_arabic,
    convert_numbers_to_arabic,
    is_arabic_char,
};

// Remove diacritical marks
let text = "مَرْحَباً بِكُمْ";
assert_eq!(remove_tashkeel(text), "مرحبا بكم");

// Normalize Arabic text
let text = "ﷺ"; // Arabic ligature
let normalized = normalize_arabic(text);
assert_eq!(normalized, "صلى الله عليه وسلم");

// Convert numbers to Arabic
let text = "Page 123";
assert_eq!(convert_numbers_to_arabic(text), "Page ١٢٣");

// Check if a character is Arabic
assert!(is_arabic_char('ع'));
assert!(!is_arabic_char('x'));
```

## Documentation

For detailed documentation and examples, please visit [docs.rs/arabic_text_utils](https://docs.rs/arabic_text_utils).

## Features in Detail

### Character Module
- `is_arabic_char`: Detect Arabic characters
- `get_arabic_char_name`: Get Unicode names for Arabic characters
- `is_haraka`: Check for diacritical marks

### Numbers Module
- `convert_numbers_to_arabic`: Convert Western numerals to Arabic
- `convert_numbers_from_arabic`: Convert Arabic numerals to Western

### Text Module
- `remove_tashkeel`: Strip diacritical marks
- `normalize_arabic`: Standardize Arabic text representation
- `contains_arabic`: Check for Arabic content
- `count_arabic_words`: Count Arabic words in text
- `extract_arabic_text`: Extract Arabic-only content
- `has_rtl_characters`: Detect right-to-left characters
- `arabic_character_frequency`: Analyze character distribution
- `segment_words`: Split text into words
- `segment_sentences`: Split text into sentences
- `generate_slug`: Create URL-friendly text
- `wrap_text`: Wrap text at specified width
- `sanitize_arabic`: Clean and standardize Arabic text

### Presentation Module
- `normalize_presentation_forms`: Standardize Arabic character forms
- `replace_ligatures`: Handle special character combinations

### Punctuation Module
- `convert_punctuation`: Convert between Arabic and Latin punctuation

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.


## Acknowledgments

- Unicode Standard for Arabic script processing
- The Rust community for their valuable feedback and contributions