Shlesha - High-Performance Schema-Driven Transliteration Library
A next-generation transliteration library built with schema-driven architecture for Sanskrit and Indic scripts. Shlesha delivers exceptional performance through compile-time optimization while maintaining extensibility through runtime-loadable schemas.
🚀 Quick Start for Developers
New to Shlesha? Get up and running in one command:
This sets up everything: Rust environment, Python bindings, WASM support, and runs all tests.
For detailed setup instructions, see DEVELOPER_SETUP.md.
📚 Complete Documentation: See DOCUMENTATION_INDEX.md for all guides and references.
⚡ Performance Highlights
Shlesha delivers exceptional performance competitive with the fastest transliteration libraries:
- Only 1.07x - 2.96x slower than Vidyut (industry-leading performance)
- 10.52 MB/s for Indic script conversions
- 6-10x better performance than our original targets
- Dramatically faster than Aksharamukha and Dharmamitra
- Schema-generated converters perform identically to hand-coded ones
🏗️ Revolutionary Schema-Based Architecture
Compile-Time Code Generation
Shlesha uses a revolutionary schema-driven approach where converters are generated at compile-time from declarative schemas:
# schemas/slp1.yaml - Generates optimized SLP1 converter
metadata:
name: "slp1"
script_type: "roman"
description: "Sanskrit Library Phonetic Basic"
target: "iso15919"
mappings:
vowels:
"A": "ā"
"I": "ī"
"U": "ū"
# ... more mappings
# schemas/bengali.yaml - Generates optimized Bengali converter
metadata:
name: "bengali"
script_type: "brahmic"
description: "Bengali/Bangla script"
mappings:
vowels:
"অ": "अ" # Bengali A → Devanagari A
"আ": "आ" # Bengali AA → Devanagari AA
# ... more mappings
Build-Time Optimization
The build system automatically generates highly optimized converters:
# Build output showing schema processing
)
🎯 Hub-and-Spoke Architecture
Smart Multi-Hub Design
- Devanagari Hub: Central format for Indic scripts (तमिल → देवनागरी → गुजराती)
- ISO-15919 Hub: Central format for romanization schemes (ITRANS → ISO → IAST)
- Cross-Hub Conversion: Seamless Indic ↔ Roman via both hubs
- Direct Conversion: Bypass hubs when possible for maximum performance
Intelligent Routing
The system automatically determines the optimal conversion path:
// Direct passthrough - zero conversion cost
transliterator.transliterate?; // instant
// Single hub - one conversion
transliterator.transliterate?; // deva→iso
// Cross-hub - optimized path
transliterator.transliterate?; // itrans→iso→deva→bengali
📚 Supported Scripts (15+ Scripts, 210+ Conversion Pairs)
Indic Scripts (Schema-Generated)
- Devanagari (
devanagari,deva) - Sanskrit, Hindi, Marathi - Bengali (
bengali,bn) - Bengali/Bangla script - Tamil (
tamil,ta) - Tamil script - Telugu (
telugu,te) - Telugu script - Gujarati (
gujarati,gu) - Gujarati script - Kannada (
kannada,kn) - Kannada script - Malayalam (
malayalam,ml) - Malayalam script - Odia (
odia,od) - Odia/Oriya script - Gurmukhi (
gurmukhi,pa) - Punjabi script - Sinhala (
sinhala,si) - Sinhala script
Romanization Schemes (Schema-Generated)
- ISO-15919 (
iso15919,iso) - International standard - ITRANS (
itrans) - Indian languages TRANSliteration - SLP1 (
slp1) - Sanskrit Library Phonetic Basic - Harvard-Kyoto (
harvard_kyoto,hk) - ASCII-based scheme - Velthuis (
velthuis) - TeX-compatible scheme - WX (
wx) - ASCII-based notation
Hand-Coded Scripts (Premium Quality)
- IAST (
iast) - International Alphabet of Sanskrit Transliteration - Kolkata (
kolkata) - Regional romanization scheme - Grantha (
grantha) - Classical Sanskrit script
🛠️ Usage Examples
Rust Library
use Shlesha;
let transliterator = new;
// High-performance cross-script conversion
let result = transliterator.transliterate?;
println!; // "ધર્મ"
// Roman to Indic conversion
let result = transliterator.transliterate?;
println!; // "தர்மக்ஷேத்ர"
// Schema-generated converters in action
let result = transliterator.transliterate?;
println!; // "dharmakśetra"
Python Bindings (PyO3)
# Create transliterator with all schema-generated converters
=
# Fast schema-based conversion
=
# "ధర్మ"
# Performance with metadata tracking
=
# "dharmakr"
# Runtime extensibility
=
Command Line Interface
# Schema-generated high-performance conversion
# Output: धर्मक्षेत्र
# Cross-script conversion via dual hubs
# Output: தர்ம
# List all schema-generated + hand-coded scripts
# Output: bengali, devanagari, gujarati, harvard_kyoto, iast, iso15919, itrans, ...
WebAssembly (Browser/Node.js)
import init from './pkg/shlesha.js';
🔧 Runtime Schema Loading
NEW: Shlesha now supports runtime schema loading across all APIs, enabling you to add custom scripts without recompilation.
Rust API
use Shlesha;
let mut transliterator = new;
// Load custom schema from YAML content
let custom_schema = r#"
metadata:
name: "my_custom_script"
script_type: "roman"
has_implicit_a: false
description: "My custom transliteration scheme"
target: "iso15919"
mappings:
vowels:
"a": "a"
"e": "ē"
consonants:
"k": "k"
"t": "ṭ"
"#;
// Load the schema at runtime
transliterator.load_schema_from_string?;
// Use immediately without recompilation
let result = transliterator.transliterate?;
println!; // "काटे"
// Schema management
let info = transliterator.get_schema_info.unwrap;
println!;
Python API
=
# Load schema from YAML string
=
# Runtime loading
# Immediate usage
=
# "क"
# Schema info
=
# Schema management
JavaScript/WASM API
import init from './pkg/shlesha.js';
Key Runtime Features
- ✅ Load from YAML strings - No file system required
- ✅ Load from file paths - For development workflows
- ✅ Schema validation - Automatic error checking
- ✅ Hot reloading - Add/remove schemas dynamically
- ✅ Schema introspection - Get metadata about loaded schemas
- ✅ Memory management - Clear schemas when done
- ✅ Cross-platform - Identical API across Rust, Python, WASM
Use Cases
Development & Testing
// Test schema variations quickly
transliterator.load_schema_from_string?;
transliterator.load_schema_from_string?;
// Compare results immediately
Dynamic Applications
# User uploads custom transliteration scheme
=
# Use immediately in application
Configuration-Driven Systems
// Load schemas from configuration
config..;
⚡ Performance & Benchmarks
Competitive Performance Analysis
Recent benchmarks show Shlesha delivers industry-competitive performance:
| Library | SLP1→ISO (71 chars) | ITRANS→ISO (71 chars) | Architecture |
|---|---|---|---|
| Vidyut | 1.75 MB/s | 1.92 MB/s | Direct conversion |
| Shlesha | 0.93 MB/s | 1.04 MB/s | Schema-generated hub |
| Performance Ratio | 1.89x slower | 1.85x slower | Extensible |
Performance Achievements
✅ 6-10x better than original performance targets
✅ Only 1.07x - 2.96x slower than Vidyut (industry leader)
✅ 10.52 MB/s for Indic script conversions
✅ Dramatically faster than Aksharamukha/Dharmamitra
✅ Schema-generated = hand-coded performance
Architecture Trade-offs
| Aspect | Shlesha | Vidyut |
|---|---|---|
| Performance | Excellent (2-3x slower) | Best-in-class |
| Extensibility | Runtime schemas | Compile-time only |
| Script Support | 15+ (easily expandable) | Limited |
| Architecture | Hub-and-spoke | Direct conversion |
| Bindings | Rust/Python/WASM/CLI | Rust only |
🏗️ Schema-Driven Development
Adding New Scripts
Adding support for new scripts is now trivial with schemas:
# schemas/new_script.yaml
metadata:
name: "NewScript"
description: "Description of the script"
unicode_block: "NewScript"
has_implicit_vowels: true
mappings:
vowels:
- source: "𑀅" # New script character
target: "अ" # Devanagari equivalent
# ... add more mappings
# Rebuild to include new script
# New script automatically available!
Template-Based Generation
Converters are generated using Handlebars templates for consistency:
{{!-- templates/indic_converter.hbs --}}
/// {{metadata.description}} converter generated from schema
pub struct {{pascal_case metadata.name}}Converter {
{{snake_case metadata.name}}_to_deva_map: HashMap<char, char>,
deva_to_{{snake_case metadata.name}}_map: HashMap<char, char>,
}
impl {{pascal_case metadata.name}}Converter {
pub fn new() -> Self {
// Generated O(1) lookup tables
let mut {{snake_case metadata.name}}_to_deva = HashMap::new();
{{#each character_mappings}}
{{snake_case ../metadata.name}}_to_deva.insert('{{this.source}}', '{{this.target}}');
{{/each}}
// ... template continues
}
}
🧪 Quality Assurance
Comprehensive Test Suite
- ✅ 127 passing tests covering all functionality
- ✅ Schema-generated converter tests for all 14 generated converters
- ✅ Performance regression tests ensuring schema = hand-coded speed
- ✅ Cross-script conversion matrix testing all 210+ pairs
- ✅ Unknown character handling with graceful degradation
Build System Validation
# Test schema-generated converters maintain performance
# Verify all conversions work
# Performance benchmarks
🔧 Build Configuration & Features
Schema Processing Features
# Default: Schema-generated + hand-coded converters
# Development mode with schema recompilation
# Minimal build (hand-coded only)
# All features (Python + WASM + CLI)
Runtime Extensibility
let mut transliterator = new;
// Load additional schemas at runtime (future feature)
transliterator.load_schema?;
// Schema registry access
let scripts = transliterator.list_supported_scripts;
println!;
🚀 Advanced Features
Metadata Collection
// Track unknown characters and conversion details
let result = transliterator.transliterate_with_metadata?;
if let Some = result.metadata
Script Characteristics
// Schema-aware script properties
let registry = default;
// Indic scripts have implicit vowels
assert!;
assert!;
// Roman schemes don't
assert!;
assert!;
Hub Processing Control
// Fine-grained control over conversion paths
let hub = new;
// Direct hub operations
let iso_text = hub.deva_to_iso?; // Devanagari → ISO
let deva_text = hub.iso_to_deva?; // ISO → Devanagari
// Cross-hub conversion with metadata
let result = hub.deva_to_iso_with_metadata?;
📖 Documentation
Complete Documentation Suite
- Architecture Guide - Deep dive into hub-and-spoke design
- Schema Reference - Complete schema format documentation
- Performance Guide - Optimization techniques and benchmarks
- API Reference - Complete function and type reference
- Developer Setup - Development environment setup
- Release System - Automated release workflow overview
- Deployment Guide - Complete deployment and environment setup
- crates.io RC Support - Release candidate publishing guide
- Security Setup - Token management and environment security
- Contributing Guide - Guidelines for contributors
Quick Reference
# Generate documentation
# Run all examples
# Performance testing
🚀 Releases
Shlesha uses an automated release system for publishing to multiple package registries:
Quick Release
# Guided release process
Package Installation
# Python (PyPI)
# WASM (npm)
# Rust (crates.io)
See DEPLOYMENT.md for complete release documentation.
🤝 Contributing
We welcome contributions! Shlesha's schema-driven architecture makes adding new scripts easier than ever:
- Add Schema: Create TOML/YAML mapping file
- Test: Run test suite to verify
- Benchmark: Ensure performance maintained
- Submit: Open PR with schema and tests
See CONTRIBUTING.md for detailed guidelines.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Unicode Consortium for Indic script standards
- ISO-15919 for romanization standardization
- Sanskrit Library for SLP1 encoding schemes
- Vidyut Project for performance benchmarking standards
- Rust Community for excellent tools (PyO3, wasm-pack, handlebars)
Shlesha - Where performance meets extensibility through intelligent schema-driven design.