Shlesha - Schema-Driven Transliteration Library
A transliteration library for Sanskrit and Indic scripts using schema-driven architecture. Built with compile-time optimization and runtime schema loading.
Quick Start
Setup command:
This sets up everything: Rust environment, Python bindings, WASM support, and runs all tests.
For detailed setup instructions, see DEVELOPER_SETUP.md.
Documentation: See DOCUMENTATION_INDEX.md for guides and references.
Architecture Features
- Schema-generated converters with compile-time optimization
- Zero runtime overhead from code generation
- Token-based conversion system for memory efficiency
Schema-Based Architecture
Compile-Time Code Generation
Converters are generated at compile-time from declarative schemas:
# schemas/slp1.yaml - Generates optimized SLP1 converter
metadata:
name: "slp1"
script_type: "roman"
description: "Sanskrit Library Phonetic Basic"
target: "iso15919"
mappings:
vowels:
"A": "ā"
"I": "ī"
"U": "ū"
# ... more mappings
# schemas/bengali.yaml - Generates optimized Bengali converter
metadata:
name: "bengali"
script_type: "brahmic"
description: "Bengali/Bangla script"
mappings:
vowels:
"অ": "अ" # Bengali A → Devanagari A
"আ": "आ" # Bengali AA → Devanagari AA
# ... more mappings
Build-Time Optimization
The build system automatically generates highly optimized converters:
# Build output showing schema processing
)
Hub-and-Spoke Architecture
Multi-Hub Design
- Devanagari Hub: Central format for Indic scripts (तमिल → देवनागरी → गुजराती)
- ISO-15919 Hub: Central format for romanization schemes (ITRANS → ISO → IAST)
- Cross-Hub Conversion: Seamless Indic ↔ Roman via both hubs
- Direct Conversion: Bypass hubs when possible for maximum performance
Routing
The system determines the conversion path:
// Direct passthrough - zero conversion cost
transliterator.transliterate?; // instant
// Single hub - one conversion
transliterator.transliterate?; // deva→iso
// Cross-hub - optimized path
transliterator.transliterate?; // itrans→iso→deva→bengali
Supported Scripts
Indic Scripts (Schema-Generated)
- Devanagari (
devanagari,deva) - Sanskrit, Hindi, Marathi - Bengali (
bengali,bn) - Bengali/Bangla script - Tamil (
tamil,ta) - Tamil script - Telugu (
telugu,te) - Telugu script - Gujarati (
gujarati,gu) - Gujarati script - Kannada (
kannada,kn) - Kannada script - Malayalam (
malayalam,ml) - Malayalam script - Odia (
odia,od) - Odia/Oriya script - Gurmukhi (
gurmukhi,pa) - Punjabi script - Sinhala (
sinhala,si) - Sinhala script - Sharada (
sharada,shrd) - Historical script of Kashmir, crucial for Vedic manuscripts - Tibetan (
tibetan,tibt,bo) - Important for Buddhist Vedic transmission - Thai (
thai,th) - Adapted from Grantha for Buddhist Vedic texts
Romanization Schemes (Schema-Generated)
- ISO-15919 (
iso15919,iso) - International standard - ITRANS (
itrans) - Indian languages TRANSliteration - SLP1 (
slp1) - Sanskrit Library Phonetic Basic - Harvard-Kyoto (
harvard_kyoto,hk) - ASCII-based scheme - Velthuis (
velthuis) - TeX-compatible scheme - WX (
wx) - ASCII-based notation
Hand-Coded Scripts
- IAST (
iast) - International Alphabet of Sanskrit Transliteration - Kolkata (
kolkata) - Regional romanization scheme - Grantha (
grantha) - Classical Sanskrit script
Usage Examples
Rust Library
use Shlesha;
let transliterator = new;
// High-performance cross-script conversion
let result = transliterator.transliterate?;
println!; // "ધર્મ"
// Roman to Indic conversion
let result = transliterator.transliterate?;
println!; // "தர்மக்ஷேத்ர"
// Schema-generated converters in action
let result = transliterator.transliterate?;
println!; // "dharmakśetra"
Python Bindings (PyO3)
# Create transliterator with all schema-generated converters
=
# Fast schema-based conversion
=
# "ధర్మ"
# Performance with metadata tracking
=
# "dharmakr"
# Runtime extensibility
=
Command Line Interface
# Schema-generated high-performance conversion
# Output: धर्मक्षेत्र
# Cross-script conversion via dual hubs
# Output: தர்ம
# List all schema-generated + hand-coded scripts
# Output: bengali, devanagari, gujarati, harvard_kyoto, iast, iso15919, itrans, ...
WebAssembly (Browser/Node.js)
import init from './pkg/shlesha.js';
Runtime Schema Loading
Shlesha supports runtime schema loading across all APIs to add custom scripts without recompilation.
Rust API
use Shlesha;
let mut transliterator = new;
// Load custom schema from YAML content
let custom_schema = r#"
metadata:
name: "my_custom_script"
script_type: "roman"
has_implicit_a: false
description: "My custom transliteration scheme"
target: "iso15919"
mappings:
vowels:
"a": "a"
"e": "ē"
consonants:
"k": "k"
"t": "ṭ"
"#;
// Load the schema at runtime
transliterator.load_schema_from_string?;
// Use immediately without recompilation
let result = transliterator.transliterate?;
println!; // "काटे"
// Schema management
let info = transliterator.get_schema_info.unwrap;
println!;
Python API
=
# Load schema from YAML string
=
# Runtime loading
# Immediate usage
=
# "क"
# Schema info
=
# Schema management
JavaScript/WASM API
import init from './pkg/shlesha.js';
Key Runtime Features
- ✅ Load from YAML strings - No file system required
- ✅ Load from file paths - For development workflows
- ✅ Schema validation - Automatic error checking
- ✅ Hot reloading - Add/remove schemas dynamically
- ✅ Schema introspection - Get metadata about loaded schemas
- ✅ Memory management - Clear schemas when done
- ✅ Cross-platform - Identical API across Rust, Python, WASM
Use Cases
Development & Testing
// Test schema variations quickly
transliterator.load_schema_from_string?;
transliterator.load_schema_from_string?;
// Compare results immediately
Dynamic Applications
# User uploads custom transliteration scheme
=
# Use immediately in application
Configuration-Driven Systems
// Load schemas from configuration
config..;
Performance & Benchmarks
Performance Analysis
Shlesha uses a hub-and-spoke architecture with schema-generated converters, trading some performance for extensibility compared to direct conversion approaches.
Performance Characteristics
- Competitive with other transliteration libraries
- Schema-generated converters match hand-coded performance
- Optimized for both short and long text processing
Architecture Trade-offs
| Aspect | Shlesha | Vidyut |
|---|---|---|
| Performance | Hub-based | Direct conversion |
| Extensibility | Runtime schemas | Compile-time only |
| Script Support | 15+ (easily expandable) | Limited |
| Architecture | Hub-and-spoke | Direct conversion |
| Bindings | Rust/Python/WASM/CLI | Rust only |
Schema-Driven Development
Adding New Scripts
Adding support for new scripts with schemas:
# schemas/new_script.yaml
metadata:
name: "NewScript"
description: "Description of the script"
unicode_block: "NewScript"
has_implicit_vowels: true
mappings:
vowels:
- source: "𑀅" # New script character
target: "अ" # Devanagari equivalent
# ... add more mappings
# Rebuild to include new script
# New script automatically available!
Template-Based Generation
Converters are generated using Handlebars templates for consistency:
{{!-- templates/indic_converter.hbs --}}
/// {{metadata.description}} converter generated from schema
pub struct {{pascal_case metadata.name}}Converter {
{{snake_case metadata.name}}_to_deva_map: HashMap<char, char>,
deva_to_{{snake_case metadata.name}}_map: HashMap<char, char>,
}
impl {{pascal_case metadata.name}}Converter {
pub fn new() -> Self {
// Generated O(1) lookup tables
let mut {{snake_case metadata.name}}_to_deva = HashMap::new();
{{#each character_mappings}}
{{snake_case ../metadata.name}}_to_deva.insert('{{this.source}}', '{{this.target}}');
{{/each}}
// ... template continues
}
}
Quality Assurance
Test Suite
- 127 tests covering all functionality
- Schema-generated converter tests for all 14 generated converters
- Performance regression tests ensuring schema = hand-coded speed
- Cross-script conversion matrix testing all 210+ pairs
- Unknown character handling
Build System Validation
# Test schema-generated converters maintain performance
# Verify all conversions work
# Performance benchmarks
Build Configuration & Features
Schema Processing Features
# Default: Schema-generated + hand-coded converters
# Development mode with schema recompilation
# Minimal build (hand-coded only)
# All features (Python + WASM + CLI)
Runtime Extensibility
let mut transliterator = new;
// Load additional schemas at runtime (future feature)
transliterator.load_schema?;
// Schema registry access
let scripts = transliterator.list_supported_scripts;
println!;
Advanced Features
Metadata Collection
// Track unknown characters and conversion details
let result = transliterator.transliterate_with_metadata?;
if let Some = result.metadata
Script Characteristics
// Schema-aware script properties
let registry = default;
// Indic scripts have implicit vowels
assert!;
assert!;
// Roman schemes don't
assert!;
assert!;
Hub Processing Control
// Fine-grained control over conversion paths
let hub = new;
// Direct hub operations
let iso_text = hub.deva_to_iso?; // Devanagari → ISO
let deva_text = hub.iso_to_deva?; // ISO → Devanagari
// Cross-hub conversion with metadata
let result = hub.deva_to_iso_with_metadata?;
Documentation
- Architecture Guide - Deep dive into hub-and-spoke design
- Schema Reference - Complete schema format documentation
- Performance Guide - Optimization techniques and benchmarks
- API Reference - Complete function and type reference
- Developer Setup - Development environment setup
- Release System - Automated release workflow overview
- Deployment Guide - Complete deployment and environment setup
- crates.io RC Support - Release candidate publishing guide
- Security Setup - Token management and environment security
- Contributing Guide - Guidelines for contributors
Quick Reference
# Generate documentation
# Run all examples
# Performance testing
Releases
Shlesha uses an automated release system for publishing to package registries:
Quick Release
# Guided release process
Package Installation
# Python (PyPI)
# WASM (npm)
# Rust (crates.io)
See DEPLOYMENT.md for complete release documentation.
Contributing
Contributions are welcome. The schema-driven architecture simplifies adding new scripts:
- Add Schema: Create TOML/YAML mapping file
- Test: Run test suite to verify
- Benchmark: Ensure performance maintained
- Submit: Open PR with schema and tests
See CONTRIBUTING.md for detailed guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Unicode Consortium for Indic script standards
- ISO-15919 for romanization standardization
- Sanskrit Library for SLP1 encoding schemes
- Vidyut Project for performance benchmarking standards
- Rust Community for excellent tools (PyO3, wasm-pack, handlebars)