# Batuta 🎵
> Orchestration framework for converting **ANY** project (Python, C/C++, Shell) to modern, first-principles Rust
[](https://opensource.org/licenses/MIT)
[](https://www.rust-lang.org/)
[](https://github.com/paiml/Batuta/actions)
[](https://github.com/paiml/Batuta/actions)
[](https://github.com/paiml/Batuta/actions)
[](https://paiml.github.io/Batuta/)
[-brightgreen)](IMPLEMENTATION.md)
[](IMPLEMENTATION.md)
[](IMPLEMENTATION.md)
[](tests/)
[](Makefile)
[](https://github.com/paiml/certeza)

## 🔒 Quality Standards
**Batuta enforces rigorous quality standards:**
- ✅ **675+ total tests** (639 unit + 36 integration + benchmarks)
- 🚀 **Coverage target: 90% minimum, 95% preferred** - approaching target
- ✅ **Core modules: 90-100% coverage** (all converters, plugin, parf, backend, tools, types, report) - TARGET MET
- ✅ **Mutation testing** validates test quality (100% on converters)
- ✅ **Zero defects tolerance** via [Certeza](https://github.com/paiml/certeza) validation
- ✅ **Performance benchmarks** (sub-nanosecond backend selection)
- ✅ **Security audits** (0 vulnerabilities)
**Coverage Breakdown:**
- Config module: **100%** coverage
- Analyzer module: **82.76%** coverage
- Types module: **~95%** coverage
- Report module: **~95%** coverage
- Backend module: **~95%** coverage
- Tools module: **~95%** coverage
- ML Converters (NumPy, sklearn, PyTorch): **~90-95%** coverage
- Plugin architecture: **~90%** coverage
- PARF analyzer: **~90%** coverage
- CLI (main.rs): **0%** unit (covered by 36 integration tests)
**Quality Validation:**
```bash
# Run certeza quality checks before committing
cd ../certeza && cargo run -- check ../Batuta
```
See [IMPLEMENTATION.md](IMPLEMENTATION.md#quality-validation-with-certeza) for full quality metrics and improvement plans.
---
Batuta orchestrates the **20-component Sovereign AI Stack** to enable **semantic-preserving** conversion of legacy codebases to high-performance Rust, complete with GPU acceleration, SIMD optimization, and ML inference capabilities.
## 🚀 Quick Start
```bash
# Install Batuta
cargo install batuta
# Analyze your project
batuta analyze --languages --dependencies --tdg
# Convert to Rust (coming soon)
batuta transpile --incremental --cache
# Optimize with GPU/SIMD (coming soon)
batuta optimize --enable-gpu --profile aggressive
# Validate equivalence (coming soon)
batuta validate --trace-syscalls --benchmark
# Build final binary (coming soon)
batuta build --release
```
## 📖 Documentation
**[Read The Batuta Book](https://paiml.github.io/Batuta/)** - Comprehensive guide covering:
- Philosophy and core principles (Toyota Way applied to code migration)
- The 5-phase workflow (Analysis → Transpilation → Optimization → Validation → Deployment)
- Tool ecosystem deep-dives (all 20 Sovereign AI Stack components)
- 50+ peer-reviewed academic references across specifications
- Practical examples and case studies
## 🎯 What is Batuta?
Batuta is named after the **conductor's baton** – it orchestrates multiple specialized tools to convert legacy code to Rust while maintaining semantic equivalence. Unlike simple transpilers, Batuta:
- **Preserves semantics** through IR-based analysis and validation
- **Optimizes automatically** with SIMD/GPU acceleration via Trueno
- **Provides gradual migration** through Ruchy scripting language
- **Applies Toyota Way principles** (Muda, Jidoka, Kaizen) for quality
## 🧩 Sovereign AI Stack
Batuta orchestrates **20 components** across 7 layers:
### Transpilers (L3)
- **[Depyler](https://github.com/paiml/depyler)** - Python → Rust with type inference
- **[Decy](https://github.com/paiml/decy)** - C/C++ → Rust with ownership inference
- **[Bashrs](https://github.com/paiml/bashrs)** v6.41.0 - Rust → Shell (bootstrap scripts)
- **[Ruchy](https://github.com/paiml/ruchy)** v3.213.0 - Script → Rust (systems scripting)
### Foundation Libraries (L0-L2)
- **[Trueno](https://github.com/paiml/trueno)** v0.7.3 - SIMD/GPU compute primitives, zero-copy
- **[Trueno-DB](https://github.com/paiml/trueno-db)** v0.3.3 - Vector database with HNSW indexing
- **[Trueno-Graph](https://github.com/paiml/trueno-graph)** v0.1.1 - Graph analytics & lineage DAG
- **[Trueno-RAG](https://github.com/paiml/trueno-rag)** - RAG: BM25+dense hybrid, RRF fusion, cross-encoder reranking ([10 papers](https://github.com/paiml/trueno-rag/blob/main/docs/specifications/rag-pipeline-spec.md))
- **[Aprender](https://github.com/paiml/aprender)** v0.12.0 - First-principles ML, .apr encryption
- **[Realizar](https://github.com/paiml/realizar)** - LLM inference: GGUF, safetensors, KV-cache
### Quality & Orchestration (L4-L5)
- **[Repartir](https://github.com/paiml/repartir)** v1.0.0 - Distributed computing
- **[pforge](https://github.com/paiml/pforge)** v0.1.2 - Zero-boilerplate MCP server framework
- **[Certeza](https://github.com/paiml/certeza)** - Quality validation framework
- **[PMAT](https://github.com/paiml/paiml-mcp-agent-toolkit)** v2.205.0 - AI context generation & code quality
- **[Renacer](https://github.com/paiml/renacer)** v0.6.5 - Syscall tracing & golden traces
### Data & MLOps (L6)
- **[Alimentar](https://github.com/paiml/alimentar)** - Data loading with .ald AES-256-GCM encryption
- **[Pacha](https://github.com/paiml/pacha)** - Model/Data/Recipe Registry: BLAKE3 deduplication, Model Cards, Datasheets, W3C PROV-DM lineage ([20 papers](https://github.com/paiml/pacha/blob/main/docs/specifications/model-data-recipe-spec.md))
## 🔮 Oracle Mode
Query the Sovereign AI Stack with natural language:
```bash
# Find the right component for your task
batuta oracle "How do I train random forest on 1M samples?"
# List all stack components
batuta oracle --list
# Show component details
batuta oracle --show aprender
# Interactive mode
batuta oracle --interactive
```
Oracle Mode uses **Amdahl's Law** and the **PCIe 5× Rule** (Gregg & Hazelwood, 2011) to recommend optimal backends (Scalar/SIMD/GPU/Distributed).
## ✍️ Content Creation Tooling
Generate structured prompts for educational content with Toyota Way quality constraints:
```bash
# List available content types
batuta content types
# Generate book chapter prompt
batuta content emit --type bch --title "Error Handling in Rust" --audience "Python developers"
# Generate high-level outline
batuta content emit --type hlo --title "ML Course" --show-budget
# Validate content against quality gates
batuta content validate --type bch chapter.md
```
**Content Types:**
- **HLO** - High-Level Outline (YAML/Markdown, 50-200 lines)
- **DLO** - Detailed Outline (YAML/Markdown, 200-1000 lines)
- **BCH** - Book Chapter (Markdown/mdBook, 2000-8000 words)
- **BLP** - Blog Post (Markdown + TOML, 500-3000 words)
- **PDM** - Presentar Demo (HTML + YAML)
**Quality Gates (Jidoka):**
- Meta-commentary detection ("In this chapter, we will...")
- Code block language validation
- Heading hierarchy enforcement
- Token budget management (Heijunka)
## 📊 Commands
### `batuta analyze`
Analyze your project to understand languages, dependencies, and code quality.
```bash
# Full analysis
batuta analyze --languages --dependencies --tdg
# Just detect languages
batuta analyze --languages
# Calculate TDG score only
batuta analyze --tdg
```
**Output includes:**
- Language breakdown with line counts and percentages
- Primary language detection
- Transpiler recommendations
- Dependency manager detection (pip, Cargo, npm, etc.)
- Package counts per dependency file
- TDG quality score (0-100) with letter grade
- ML framework detection
- Next steps guidance
### `batuta init` (Coming Soon)
Initialize a Batuta project and set up conversion configuration.
```bash
batuta init --source ./my-python-app --output ./my-rust-app
```
### `batuta transpile` (Coming Soon)
Convert source code to Rust with incremental compilation and caching.
```bash
# Basic transpilation
batuta transpile
# Incremental mode with caching
batuta transpile --incremental --cache
# Specific modules only
batuta transpile --modules auth,api,db
# Generate Ruchy for gradual migration
batuta transpile --ruchy --repl
```
### `batuta optimize` (Coming Soon)
Apply performance optimizations with GPU/SIMD acceleration.
```bash
# Balanced optimization (default)
batuta optimize
# Aggressive optimization
batuta optimize --profile aggressive --enable-gpu
# Custom GPU threshold
batuta optimize --enable-gpu --gpu-threshold 1000
```
**Optimization profiles:**
- `fast` - Quick compilation, basic optimizations
- `balanced` - Default, good compilation/performance trade-off
- `aggressive` - Maximum performance, slower compilation
### `batuta validate` (Coming Soon)
Verify semantic equivalence between original and transpiled code.
```bash
# Full validation suite
batuta validate --trace-syscalls --diff-output --run-original-tests --benchmark
# Quick syscall validation
batuta validate --trace-syscalls
```
### `batuta build` (Coming Soon)
Build optimized Rust binaries with cross-compilation support.
```bash
# Release build
batuta build --release
# Cross-compile
batuta build --target x86_64-unknown-linux-musl
# WebAssembly
batuta build --wasm
```
### `batuta report` (Coming Soon)
Generate comprehensive migration reports.
```bash
# HTML report (default)
batuta report
# Markdown for documentation
batuta report --format markdown --output MIGRATION.md
# JSON for CI/CD
batuta report --format json --output report.json
```
## 🏗️ 5-Phase Workflow
Batuta implements a **5-phase Kanban workflow** based on Toyota Way principles:
### Phase 1: Analysis
- Detect project languages and structure
- Calculate technical debt grade (TDG)
- Identify dependencies and frameworks
- Recommend transpilation strategy
### Phase 2: Transpilation
- Convert code to Rust/Ruchy using appropriate transpiler
- Preserve semantics through IR analysis
- Generate human-readable output
- Support incremental compilation
### Phase 3: Optimization
- Apply SIMD vectorization (via Trueno)
- Enable GPU acceleration for compute-heavy code
- Optimize memory layout
- Select backends via Mixture-of-Experts routing
### Phase 4: Validation
- Trace syscalls to verify equivalence (via Renacer)
- Run original test suite
- Compare outputs and performance
- Generate diff reports
### Phase 5: Deployment
- Build optimized binaries
- Cross-compile for target platforms
- Package for distribution
- Generate migration documentation
## 🎓 Toyota Way Principles
Batuta applies **Lean Manufacturing** principles to code migration:
### Muda (Waste Elimination)
- **StaticFixer integration** - Eliminate duplicate static analysis (~40% reduction)
- **PMAT adaptive analysis** - Focus on critical code, skip boilerplate
- **Decy diagnostics** - Clear, actionable error messages reduce confusion
### Jidoka (Built-in Quality)
- **Ruchy strictness levels** - Gradual quality at migration boundaries
- **Pipeline validation** - Quality checks at each phase
- **Semantic equivalence** - Automated verification via syscall tracing
### Kaizen (Continuous Improvement)
- **MoE optimization** - Continuous performance tuning
- **Incremental features** - Deliver value progressively
- **Feedback loops** - Learn from each migration
### Heijunka (Level Scheduling)
- **Batuta orchestrator** - Balanced load across transpilers
- **Parallel processing** - Efficient resource utilization
### Kanban (Visual Workflow)
- **5-phase tracking** - Clear stage visibility
- **Dependency management** - Automatic task ordering
### Andon (Problem Visualization)
- **Renacer integration** - Runtime behavior analysis
- **TDG scoring** - Quality visibility
## 📚 Academic Foundation
Every specification cites peer-reviewed research (50+ papers total):
| **Pacha** | 20 | Model Cards [Mitchell 2019], Datasheets [Gebru 2021], PROV-DM [W3C] |
| **Trueno-RAG** | 10 | RAG [Lewis 2020], DPR [Karpukhin 2020], BM25 [Robertson 2009] |
| **Trueno-DB** | HNSW | [Malkov 2020] IEEE TPAMI |
This isn't marketing—it's engineering rigor applied to every design decision.
## 📈 Example: Python ML Project
```bash
# 1. Analyze the project
$ batuta analyze --languages --dependencies --tdg
📊 Analysis Results
==================================================
Primary language: Python
Total files: 127
Total lines: 8,432
Dependencies:
• pip (42 packages)
File: "./requirements.txt"
• ℹ ML frameworks detected - consider Aprender/Realizar for ML code
Quality Score:
• TDG Score: 73.2/100 (B)
Recommended transpiler: Depyler (Python → Rust)
# 2. Transpile to Rust (coming soon)
$ batuta transpile --incremental
🔄 Transpiling with Depyler...
✓ Converted 127 files (3,891 warnings, 42 errors addressed)
✓ NumPy → Trueno: 23 operations
✓ sklearn → Aprender: 5 models
✓ PyTorch → Realizar: 2 inference pipelines
# 3. Optimize (coming soon)
$ batuta optimize --enable-gpu --profile aggressive
⚡ Optimizing...
✓ SIMD vectorization: 234 loops optimized
✓ GPU dispatch: 12 operations (threshold: 500 elements)
✓ Memory layout: 18 structs optimized
# 4. Validate (coming soon)
$ batuta validate --trace-syscalls --benchmark
✅ Validation passed!
✓ Syscall equivalence: 100%
✓ Output identical: ✓
✓ Performance: 4.2x faster, 62% less memory
```
## 🛠️ Development Status
**Current Version:** 0.1.2 (Alpha)
- ✅ **Phase 1: Analysis** - Complete
- ✅ Language detection
- ✅ Dependency analysis
- ✅ TDG scoring
- ✅ Transpiler recommendations
- 🚧 **Phase 2: Core Orchestration** - In Progress
- ⏳ CLI scaffolding (complete)
- ⏳ Transpilation engine
- ⏳ 5-phase workflow
- ⏳ PMAT integration
- 📋 **Phase 3: Advanced Pipelines** - Planned
- 📋 NumPy → Trueno
- 📋 sklearn → Aprender
- 📋 PyTorch → Realizar
- 📋 **Phase 4: Enterprise Features** - Future
- 📋 Renacer tracing
- 📋 PARF reference finder
See [roadmap.yaml](docs/roadmaps/roadmap.yaml) for complete ticket breakdown (12 tickets, 572 hours).
## 📖 Documentation
- [Specification](docs/specifications/batuta-orchestration-decy-depyler-trueno-aprender-realizar-ruchy-spec.md) - Complete technical specification
- [Roadmap](docs/roadmaps/roadmap.yaml) - PMAT-tracked development roadmap
- [PMAT Bug Report](PMAT_BUG_REPORT.md) - Known issues with PMAT workflow
## 🤝 Contributing
Batuta is part of the [Pragmatic AI Labs](https://github.com/paiml) ecosystem. Contributions are welcome!
```bash
# Clone and build
git clone https://github.com/paiml/Batuta.git
cd Batuta
cargo build --release
# Run tests
cargo test
# Install locally
cargo install --path .
```
## 📄 License
MIT License - see [LICENSE](LICENSE) for details.
## 🔗 Related Projects
**Transpilers:**
- [Depyler](https://github.com/paiml/depyler) - Python → Rust with type inference
- [Decy](https://github.com/paiml/decy) - C/C++ → Rust with ownership inference
**Compute & AI:**
- [Trueno](https://github.com/paiml/trueno) - SIMD/GPU compute primitives
- [Trueno-RAG](https://github.com/paiml/trueno-rag) - RAG pipeline (10 peer-reviewed papers)
- [Realizar](https://github.com/paiml/realizar) - LLM inference (GGUF, safetensors)
**MLOps & Quality:**
- [Pacha](https://github.com/paiml/pacha) - Model/Data/Recipe registry (20 peer-reviewed papers)
- [PMAT](https://github.com/paiml/paiml-mcp-agent-toolkit) - AI context & code quality
- [Renacer](https://github.com/paiml/renacer) - Syscall tracing & golden traces
## 🙏 Acknowledgments
Batuta applies principles from:
- **Toyota Production System** - Muda, Jidoka, Kaizen, Heijunka, Kanban, Andon
- **Lean Software Development** - Value stream optimization
- **First Principles Thinking** - Rebuild from fundamental truths
---
**Batuta** - Because every great orchestra needs a conductor. 🎵