Batuta ๐ต
Orchestration framework for converting ANY project (Python, C/C++, Shell) to modern, first-principles Rust
๐ Quality Standards
Batuta enforces rigorous quality standards:
- โ 675+ total tests (639 unit + 36 integration + benchmarks)
- ๐ Coverage target: 90% minimum, 95% preferred - approaching target
- โ Core modules: 90-100% coverage (all converters, plugin, parf, backend, tools, types, report) - TARGET MET
- โ Mutation testing validates test quality (100% on converters)
- โ Zero defects tolerance via Certeza validation
- โ Performance benchmarks (sub-nanosecond backend selection)
- โ Security audits (0 vulnerabilities)
Coverage Breakdown:
- Config module: 100% coverage
- Analyzer module: 82.76% coverage
- Types module: ~95% coverage
- Report module: ~95% coverage
- Backend module: ~95% coverage
- Tools module: ~95% coverage
- ML Converters (NumPy, sklearn, PyTorch): ~90-95% coverage
- Plugin architecture: ~90% coverage
- PARF analyzer: ~90% coverage
- CLI (main.rs): 0% unit (covered by 36 integration tests)
Quality Validation:
# Run certeza quality checks before committing
&&
See IMPLEMENTATION.md for full quality metrics and improvement plans.
Batuta orchestrates the 20-component Sovereign AI Stack to enable semantic-preserving conversion of legacy codebases to high-performance Rust, complete with GPU acceleration, SIMD optimization, and ML inference capabilities.
๐ Quick Start
# Install Batuta
# Analyze your project
# Convert to Rust (coming soon)
# Optimize with GPU/SIMD (coming soon)
# Validate equivalence (coming soon)
# Build final binary (coming soon)
๐ Documentation
Read The Batuta Book - Comprehensive guide covering:
- Philosophy and core principles (Toyota Way applied to code migration)
- The 5-phase workflow (Analysis โ Transpilation โ Optimization โ Validation โ Deployment)
- Tool ecosystem deep-dives (all 20 Sovereign AI Stack components)
- 50+ peer-reviewed academic references across specifications
- Practical examples and case studies
๐ฏ What is Batuta?
Batuta is named after the conductor's baton โ it orchestrates multiple specialized tools to convert legacy code to Rust while maintaining semantic equivalence. Unlike simple transpilers, Batuta:
- Preserves semantics through IR-based analysis and validation
- Optimizes automatically with SIMD/GPU acceleration via Trueno
- Provides gradual migration through Ruchy scripting language
- Applies Toyota Way principles (Muda, Jidoka, Kaizen) for quality
๐งฉ Sovereign AI Stack
Batuta orchestrates 20 components across 7 layers:
Transpilers (L3)
- Depyler - Python โ Rust with type inference
- Decy - C/C++ โ Rust with ownership inference
- Bashrs v6.41.0 - Rust โ Shell (bootstrap scripts)
- Ruchy v3.213.0 - Script โ Rust (systems scripting)
Foundation Libraries (L0-L2)
- Trueno v0.7.3 - SIMD/GPU compute primitives, zero-copy
- Trueno-DB v0.3.3 - Vector database with HNSW indexing
- Trueno-Graph v0.1.1 - Graph analytics & lineage DAG
- Trueno-RAG - RAG: BM25+dense hybrid, RRF fusion, cross-encoder reranking (10 papers)
- Aprender v0.12.0 - First-principles ML, .apr encryption
- Realizar - LLM inference: GGUF, safetensors, KV-cache
Quality & Orchestration (L4-L5)
- Repartir v1.0.0 - Distributed computing
- pforge v0.1.2 - Zero-boilerplate MCP server framework
- Certeza - Quality validation framework
- PMAT v2.205.0 - AI context generation & code quality
- Renacer v0.6.5 - Syscall tracing & golden traces
Data & MLOps (L6)
- Alimentar - Data loading with .ald AES-256-GCM encryption
- Pacha - Model/Data/Recipe Registry: BLAKE3 deduplication, Model Cards, Datasheets, W3C PROV-DM lineage (20 papers)
๐ฎ Oracle Mode
Query the Sovereign AI Stack with natural language:
# Find the right component for your task
# List all stack components
# Show component details
# Interactive mode
Oracle Mode uses Amdahl's Law and the PCIe 5ร Rule (Gregg & Hazelwood, 2011) to recommend optimal backends (Scalar/SIMD/GPU/Distributed).
๐ Commands
batuta analyze
Analyze your project to understand languages, dependencies, and code quality.
# Full analysis
# Just detect languages
# Calculate TDG score only
Output includes:
- Language breakdown with line counts and percentages
- Primary language detection
- Transpiler recommendations
- Dependency manager detection (pip, Cargo, npm, etc.)
- Package counts per dependency file
- TDG quality score (0-100) with letter grade
- ML framework detection
- Next steps guidance
batuta init (Coming Soon)
Initialize a Batuta project and set up conversion configuration.
batuta transpile (Coming Soon)
Convert source code to Rust with incremental compilation and caching.
# Basic transpilation
# Incremental mode with caching
# Specific modules only
# Generate Ruchy for gradual migration
batuta optimize (Coming Soon)
Apply performance optimizations with GPU/SIMD acceleration.
# Balanced optimization (default)
# Aggressive optimization
# Custom GPU threshold
Optimization profiles:
fast- Quick compilation, basic optimizationsbalanced- Default, good compilation/performance trade-offaggressive- Maximum performance, slower compilation
batuta validate (Coming Soon)
Verify semantic equivalence between original and transpiled code.
# Full validation suite
# Quick syscall validation
batuta build (Coming Soon)
Build optimized Rust binaries with cross-compilation support.
# Release build
# Cross-compile
# WebAssembly
batuta report (Coming Soon)
Generate comprehensive migration reports.
# HTML report (default)
# Markdown for documentation
# JSON for CI/CD
๐๏ธ 5-Phase Workflow
Batuta implements a 5-phase Kanban workflow based on Toyota Way principles:
Phase 1: Analysis
- Detect project languages and structure
- Calculate technical debt grade (TDG)
- Identify dependencies and frameworks
- Recommend transpilation strategy
Phase 2: Transpilation
- Convert code to Rust/Ruchy using appropriate transpiler
- Preserve semantics through IR analysis
- Generate human-readable output
- Support incremental compilation
Phase 3: Optimization
- Apply SIMD vectorization (via Trueno)
- Enable GPU acceleration for compute-heavy code
- Optimize memory layout
- Select backends via Mixture-of-Experts routing
Phase 4: Validation
- Trace syscalls to verify equivalence (via Renacer)
- Run original test suite
- Compare outputs and performance
- Generate diff reports
Phase 5: Deployment
- Build optimized binaries
- Cross-compile for target platforms
- Package for distribution
- Generate migration documentation
๐ Toyota Way Principles
Batuta applies Lean Manufacturing principles to code migration:
Muda (Waste Elimination)
- StaticFixer integration - Eliminate duplicate static analysis (~40% reduction)
- PMAT adaptive analysis - Focus on critical code, skip boilerplate
- Decy diagnostics - Clear, actionable error messages reduce confusion
Jidoka (Built-in Quality)
- Ruchy strictness levels - Gradual quality at migration boundaries
- Pipeline validation - Quality checks at each phase
- Semantic equivalence - Automated verification via syscall tracing
Kaizen (Continuous Improvement)
- MoE optimization - Continuous performance tuning
- Incremental features - Deliver value progressively
- Feedback loops - Learn from each migration
Heijunka (Level Scheduling)
- Batuta orchestrator - Balanced load across transpilers
- Parallel processing - Efficient resource utilization
Kanban (Visual Workflow)
- 5-phase tracking - Clear stage visibility
- Dependency management - Automatic task ordering
Andon (Problem Visualization)
- Renacer integration - Runtime behavior analysis
- TDG scoring - Quality visibility
๐ Academic Foundation
Every specification cites peer-reviewed research (50+ papers total):
| Component | Papers | Key Citations |
|---|---|---|
| Pacha | 20 | Model Cards [Mitchell 2019], Datasheets [Gebru 2021], PROV-DM [W3C] |
| Trueno-RAG | 10 | RAG [Lewis 2020], DPR [Karpukhin 2020], BM25 [Robertson 2009] |
| Trueno-DB | HNSW | [Malkov 2020] IEEE TPAMI |
This isn't marketingโit's engineering rigor applied to every design decision.
๐ Example: Python ML Project
# 1. Analyze the project
==================================================
)
)
)
# 2. Transpile to Rust (coming soon)
)
# 3. Optimize (coming soon)
)
# 4. Validate (coming soon)
๐ ๏ธ Development Status
Current Version: 0.1.0 (Alpha)
-
โ Phase 1: Analysis - Complete
- โ Language detection
- โ Dependency analysis
- โ TDG scoring
- โ Transpiler recommendations
-
๐ง Phase 2: Core Orchestration - In Progress
- โณ CLI scaffolding (complete)
- โณ Transpilation engine
- โณ 5-phase workflow
- โณ PMAT integration
-
๐ Phase 3: Advanced Pipelines - Planned
- ๐ NumPy โ Trueno
- ๐ sklearn โ Aprender
- ๐ PyTorch โ Realizar
-
๐ Phase 4: Enterprise Features - Future
- ๐ Renacer tracing
- ๐ PARF reference finder
See roadmap.yaml for complete ticket breakdown (12 tickets, 572 hours).
๐ Documentation
- Specification - Complete technical specification
- Roadmap - PMAT-tracked development roadmap
- PMAT Bug Report - Known issues with PMAT workflow
๐ค Contributing
Batuta is part of the Pragmatic AI Labs ecosystem. Contributions are welcome!
# Clone and build
# Run tests
# Install locally
๐ License
MIT License - see LICENSE for details.
๐ Related Projects
Transpilers:
Compute & AI:
- Trueno - SIMD/GPU compute primitives
- Trueno-RAG - RAG pipeline (10 peer-reviewed papers)
- Realizar - LLM inference (GGUF, safetensors)
MLOps & Quality:
- Pacha - Model/Data/Recipe registry (20 peer-reviewed papers)
- PMAT - AI context & code quality
- Renacer - Syscall tracing & golden traces
๐ Acknowledgments
Batuta applies principles from:
- Toyota Production System - Muda, Jidoka, Kaizen, Heijunka, Kanban, Andon
- Lean Software Development - Value stream optimization
- First Principles Thinking - Rebuild from fundamental truths
Batuta - Because every great orchestra needs a conductor. ๐ต