batuta 0.1.2

Orchestration framework for converting ANY project (Python, C/C++, Shell) to modern Rust
Documentation

Batuta ๐ŸŽต

Orchestration framework for converting ANY project (Python, C/C++, Shell) to modern, first-principles Rust

License: MIT Rust CI/CD Docker WASM Book TDG Score Unit Coverage Core Modules Tests Pre-commit Quality

Batuta Architecture

๐Ÿ”’ Quality Standards

Batuta enforces rigorous quality standards:

  • โœ… 675+ total tests (639 unit + 36 integration + benchmarks)
  • ๐Ÿš€ Coverage target: 90% minimum, 95% preferred - approaching target
  • โœ… Core modules: 90-100% coverage (all converters, plugin, parf, backend, tools, types, report) - TARGET MET
  • โœ… Mutation testing validates test quality (100% on converters)
  • โœ… Zero defects tolerance via Certeza validation
  • โœ… Performance benchmarks (sub-nanosecond backend selection)
  • โœ… Security audits (0 vulnerabilities)

Coverage Breakdown:

  • Config module: 100% coverage
  • Analyzer module: 82.76% coverage
  • Types module: ~95% coverage
  • Report module: ~95% coverage
  • Backend module: ~95% coverage
  • Tools module: ~95% coverage
  • ML Converters (NumPy, sklearn, PyTorch): ~90-95% coverage
  • Plugin architecture: ~90% coverage
  • PARF analyzer: ~90% coverage
  • CLI (main.rs): 0% unit (covered by 36 integration tests)

Quality Validation:

# Run certeza quality checks before committing
cd ../certeza && cargo run -- check ../Batuta

See IMPLEMENTATION.md for full quality metrics and improvement plans.


Batuta orchestrates the 20-component Sovereign AI Stack to enable semantic-preserving conversion of legacy codebases to high-performance Rust, complete with GPU acceleration, SIMD optimization, and ML inference capabilities.

๐Ÿš€ Quick Start

# Install Batuta
cargo install batuta

# Analyze your project
batuta analyze --languages --dependencies --tdg

# Convert to Rust (coming soon)
batuta transpile --incremental --cache

# Optimize with GPU/SIMD (coming soon)
batuta optimize --enable-gpu --profile aggressive

# Validate equivalence (coming soon)
batuta validate --trace-syscalls --benchmark

# Build final binary (coming soon)
batuta build --release

๐Ÿ“– Documentation

Read The Batuta Book - Comprehensive guide covering:

  • Philosophy and core principles (Toyota Way applied to code migration)
  • The 5-phase workflow (Analysis โ†’ Transpilation โ†’ Optimization โ†’ Validation โ†’ Deployment)
  • Tool ecosystem deep-dives (all 20 Sovereign AI Stack components)
  • 50+ peer-reviewed academic references across specifications
  • Practical examples and case studies

๐ŸŽฏ What is Batuta?

Batuta is named after the conductor's baton โ€“ it orchestrates multiple specialized tools to convert legacy code to Rust while maintaining semantic equivalence. Unlike simple transpilers, Batuta:

  • Preserves semantics through IR-based analysis and validation
  • Optimizes automatically with SIMD/GPU acceleration via Trueno
  • Provides gradual migration through Ruchy scripting language
  • Applies Toyota Way principles (Muda, Jidoka, Kaizen) for quality

๐Ÿงฉ Sovereign AI Stack

Batuta orchestrates 20 components across 7 layers:

Transpilers (L3)

  • Depyler - Python โ†’ Rust with type inference
  • Decy - C/C++ โ†’ Rust with ownership inference
  • Bashrs v6.41.0 - Rust โ†’ Shell (bootstrap scripts)
  • Ruchy v3.213.0 - Script โ†’ Rust (systems scripting)

Foundation Libraries (L0-L2)

  • Trueno v0.7.3 - SIMD/GPU compute primitives, zero-copy
  • Trueno-DB v0.3.3 - Vector database with HNSW indexing
  • Trueno-Graph v0.1.1 - Graph analytics & lineage DAG
  • Trueno-RAG - RAG: BM25+dense hybrid, RRF fusion, cross-encoder reranking (10 papers)
  • Aprender v0.12.0 - First-principles ML, .apr encryption
  • Realizar - LLM inference: GGUF, safetensors, KV-cache

Quality & Orchestration (L4-L5)

  • Repartir v1.0.0 - Distributed computing
  • pforge v0.1.2 - Zero-boilerplate MCP server framework
  • Certeza - Quality validation framework
  • PMAT v2.205.0 - AI context generation & code quality
  • Renacer v0.6.5 - Syscall tracing & golden traces

Data & MLOps (L6)

  • Alimentar - Data loading with .ald AES-256-GCM encryption
  • Pacha - Model/Data/Recipe Registry: BLAKE3 deduplication, Model Cards, Datasheets, W3C PROV-DM lineage (20 papers)

๐Ÿ”ฎ Oracle Mode

Query the Sovereign AI Stack with natural language:

# Find the right component for your task
batuta oracle "How do I train random forest on 1M samples?"

# List all stack components
batuta oracle --list

# Show component details
batuta oracle --show aprender

# Interactive mode
batuta oracle --interactive

Oracle Mode uses Amdahl's Law and the PCIe 5ร— Rule (Gregg & Hazelwood, 2011) to recommend optimal backends (Scalar/SIMD/GPU/Distributed).

โœ๏ธ Content Creation Tooling

Generate structured prompts for educational content with Toyota Way quality constraints:

# List available content types
batuta content types

# Generate book chapter prompt
batuta content emit --type bch --title "Error Handling in Rust" --audience "Python developers"

# Generate high-level outline
batuta content emit --type hlo --title "ML Course" --show-budget

# Validate content against quality gates
batuta content validate --type bch chapter.md

Content Types:

  • HLO - High-Level Outline (YAML/Markdown, 50-200 lines)
  • DLO - Detailed Outline (YAML/Markdown, 200-1000 lines)
  • BCH - Book Chapter (Markdown/mdBook, 2000-8000 words)
  • BLP - Blog Post (Markdown + TOML, 500-3000 words)
  • PDM - Presentar Demo (HTML + YAML)

Quality Gates (Jidoka):

  • Meta-commentary detection ("In this chapter, we will...")
  • Code block language validation
  • Heading hierarchy enforcement
  • Token budget management (Heijunka)

๐Ÿ“Š Commands

batuta analyze

Analyze your project to understand languages, dependencies, and code quality.

# Full analysis
batuta analyze --languages --dependencies --tdg

# Just detect languages
batuta analyze --languages

# Calculate TDG score only
batuta analyze --tdg

Output includes:

  • Language breakdown with line counts and percentages
  • Primary language detection
  • Transpiler recommendations
  • Dependency manager detection (pip, Cargo, npm, etc.)
  • Package counts per dependency file
  • TDG quality score (0-100) with letter grade
  • ML framework detection
  • Next steps guidance

batuta init (Coming Soon)

Initialize a Batuta project and set up conversion configuration.

batuta init --source ./my-python-app --output ./my-rust-app

batuta transpile (Coming Soon)

Convert source code to Rust with incremental compilation and caching.

# Basic transpilation
batuta transpile

# Incremental mode with caching
batuta transpile --incremental --cache

# Specific modules only
batuta transpile --modules auth,api,db

# Generate Ruchy for gradual migration
batuta transpile --ruchy --repl

batuta optimize (Coming Soon)

Apply performance optimizations with GPU/SIMD acceleration.

# Balanced optimization (default)
batuta optimize

# Aggressive optimization
batuta optimize --profile aggressive --enable-gpu

# Custom GPU threshold
batuta optimize --enable-gpu --gpu-threshold 1000

Optimization profiles:

  • fast - Quick compilation, basic optimizations
  • balanced - Default, good compilation/performance trade-off
  • aggressive - Maximum performance, slower compilation

batuta validate (Coming Soon)

Verify semantic equivalence between original and transpiled code.

# Full validation suite
batuta validate --trace-syscalls --diff-output --run-original-tests --benchmark

# Quick syscall validation
batuta validate --trace-syscalls

batuta build (Coming Soon)

Build optimized Rust binaries with cross-compilation support.

# Release build
batuta build --release

# Cross-compile
batuta build --target x86_64-unknown-linux-musl

# WebAssembly
batuta build --wasm

batuta report (Coming Soon)

Generate comprehensive migration reports.

# HTML report (default)
batuta report

# Markdown for documentation
batuta report --format markdown --output MIGRATION.md

# JSON for CI/CD
batuta report --format json --output report.json

๐Ÿ—๏ธ 5-Phase Workflow

Batuta implements a 5-phase Kanban workflow based on Toyota Way principles:

Phase 1: Analysis

  • Detect project languages and structure
  • Calculate technical debt grade (TDG)
  • Identify dependencies and frameworks
  • Recommend transpilation strategy

Phase 2: Transpilation

  • Convert code to Rust/Ruchy using appropriate transpiler
  • Preserve semantics through IR analysis
  • Generate human-readable output
  • Support incremental compilation

Phase 3: Optimization

  • Apply SIMD vectorization (via Trueno)
  • Enable GPU acceleration for compute-heavy code
  • Optimize memory layout
  • Select backends via Mixture-of-Experts routing

Phase 4: Validation

  • Trace syscalls to verify equivalence (via Renacer)
  • Run original test suite
  • Compare outputs and performance
  • Generate diff reports

Phase 5: Deployment

  • Build optimized binaries
  • Cross-compile for target platforms
  • Package for distribution
  • Generate migration documentation

๐ŸŽ“ Toyota Way Principles

Batuta applies Lean Manufacturing principles to code migration:

Muda (Waste Elimination)

  • StaticFixer integration - Eliminate duplicate static analysis (~40% reduction)
  • PMAT adaptive analysis - Focus on critical code, skip boilerplate
  • Decy diagnostics - Clear, actionable error messages reduce confusion

Jidoka (Built-in Quality)

  • Ruchy strictness levels - Gradual quality at migration boundaries
  • Pipeline validation - Quality checks at each phase
  • Semantic equivalence - Automated verification via syscall tracing

Kaizen (Continuous Improvement)

  • MoE optimization - Continuous performance tuning
  • Incremental features - Deliver value progressively
  • Feedback loops - Learn from each migration

Heijunka (Level Scheduling)

  • Batuta orchestrator - Balanced load across transpilers
  • Parallel processing - Efficient resource utilization

Kanban (Visual Workflow)

  • 5-phase tracking - Clear stage visibility
  • Dependency management - Automatic task ordering

Andon (Problem Visualization)

  • Renacer integration - Runtime behavior analysis
  • TDG scoring - Quality visibility

๐Ÿ“š Academic Foundation

Every specification cites peer-reviewed research (50+ papers total):

Component Papers Key Citations
Pacha 20 Model Cards [Mitchell 2019], Datasheets [Gebru 2021], PROV-DM [W3C]
Trueno-RAG 10 RAG [Lewis 2020], DPR [Karpukhin 2020], BM25 [Robertson 2009]
Trueno-DB HNSW [Malkov 2020] IEEE TPAMI

This isn't marketingโ€”it's engineering rigor applied to every design decision.

๐Ÿ“ˆ Example: Python ML Project

# 1. Analyze the project
$ batuta analyze --languages --dependencies --tdg

๐Ÿ“Š Analysis Results
==================================================
Primary language: Python
Total files: 127
Total lines: 8,432

Dependencies:
  โ€ข pip (42 packages)
    File: "./requirements.txt"
  โ€ข โ„น ML frameworks detected - consider Aprender/Realizar for ML code

Quality Score:
  โ€ข TDG Score: 73.2/100 (B)

Recommended transpiler: Depyler (Python โ†’ Rust)

# 2. Transpile to Rust (coming soon)
$ batuta transpile --incremental

๐Ÿ”„ Transpiling with Depyler...
  โœ“ Converted 127 files (3,891 warnings, 42 errors addressed)
  โœ“ NumPy โ†’ Trueno: 23 operations
  โœ“ sklearn โ†’ Aprender: 5 models
  โœ“ PyTorch โ†’ Realizar: 2 inference pipelines

# 3. Optimize (coming soon)
$ batuta optimize --enable-gpu --profile aggressive

โšก Optimizing...
  โœ“ SIMD vectorization: 234 loops optimized
  โœ“ GPU dispatch: 12 operations (threshold: 500 elements)
  โœ“ Memory layout: 18 structs optimized

# 4. Validate (coming soon)
$ batuta validate --trace-syscalls --benchmark

โœ… Validation passed!
  โœ“ Syscall equivalence: 100%
  โœ“ Output identical: โœ“
  โœ“ Performance: 4.2x faster, 62% less memory

๐Ÿ› ๏ธ Development Status

Current Version: 0.1.2 (Alpha)

  • โœ… Phase 1: Analysis - Complete

    • โœ… Language detection
    • โœ… Dependency analysis
    • โœ… TDG scoring
    • โœ… Transpiler recommendations
  • ๐Ÿšง Phase 2: Core Orchestration - In Progress

    • โณ CLI scaffolding (complete)
    • โณ Transpilation engine
    • โณ 5-phase workflow
    • โณ PMAT integration
  • ๐Ÿ“‹ Phase 3: Advanced Pipelines - Planned

    • ๐Ÿ“‹ NumPy โ†’ Trueno
    • ๐Ÿ“‹ sklearn โ†’ Aprender
    • ๐Ÿ“‹ PyTorch โ†’ Realizar
  • ๐Ÿ“‹ Phase 4: Enterprise Features - Future

    • ๐Ÿ“‹ Renacer tracing
    • ๐Ÿ“‹ PARF reference finder

See roadmap.yaml for complete ticket breakdown (12 tickets, 572 hours).

๐Ÿ“– Documentation

๐Ÿค Contributing

Batuta is part of the Pragmatic AI Labs ecosystem. Contributions are welcome!

# Clone and build
git clone https://github.com/paiml/Batuta.git
cd Batuta
cargo build --release

# Run tests
cargo test

# Install locally
cargo install --path .

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿ”— Related Projects

Transpilers:

  • Depyler - Python โ†’ Rust with type inference
  • Decy - C/C++ โ†’ Rust with ownership inference

Compute & AI:

  • Trueno - SIMD/GPU compute primitives
  • Trueno-RAG - RAG pipeline (10 peer-reviewed papers)
  • Realizar - LLM inference (GGUF, safetensors)

MLOps & Quality:

  • Pacha - Model/Data/Recipe registry (20 peer-reviewed papers)
  • PMAT - AI context & code quality
  • Renacer - Syscall tracing & golden traces

๐Ÿ™ Acknowledgments

Batuta applies principles from:

  • Toyota Production System - Muda, Jidoka, Kaizen, Heijunka, Kanban, Andon
  • Lean Software Development - Value stream optimization
  • First Principles Thinking - Rebuild from fundamental truths

Batuta - Because every great orchestra needs a conductor. ๐ŸŽต