batuta 0.1.2

Orchestration framework for converting ANY project (Python, C/C++, Shell) to modern Rust
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
# Batuta 🎵

> Orchestration framework for converting **ANY** project (Python, C/C++, Shell) to modern, first-principles Rust

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Rust](https://img.shields.io/badge/rust-1.75%2B-orange.svg)](https://www.rust-lang.org/)
[![CI/CD](https://github.com/paiml/Batuta/workflows/CI%2FCD%20Pipeline/badge.svg)](https://github.com/paiml/Batuta/actions)
[![Docker](https://github.com/paiml/Batuta/workflows/Docker%20Build%20%26%20Test/badge.svg)](https://github.com/paiml/Batuta/actions)
[![WASM](https://github.com/paiml/Batuta/workflows/WASM%20Build%20%26%20Test/badge.svg)](https://github.com/paiml/Batuta/actions)
[![Book](https://github.com/paiml/Batuta/workflows/Deploy%20Book/badge.svg)](https://paiml.github.io/Batuta/)
[![TDG Score](https://img.shields.io/badge/TDG-92.6%2F100%20(A)-brightgreen)](IMPLEMENTATION.md)
[![Unit Coverage](https://img.shields.io/badge/unit_coverage-31.45%25-orange)](IMPLEMENTATION.md)
[![Core Modules](https://img.shields.io/badge/core_modules-82--100%25-brightgreen)](IMPLEMENTATION.md)
[![Tests](https://img.shields.io/badge/tests-639_unit+36_integration-brightgreen)](tests/)
[![Pre-commit](https://img.shields.io/badge/pre--commit-%3C%2030s-brightgreen)](Makefile)
[![Quality](https://img.shields.io/badge/quality-certeza-purple)](https://github.com/paiml/certeza)

![Batuta Architecture](.github/batuta-architecture.svg)

## 🔒 Quality Standards

**Batuta enforces rigorous quality standards:**

- **675+ total tests** (639 unit + 36 integration + benchmarks)
- 🚀 **Coverage target: 90% minimum, 95% preferred** - approaching target
-**Core modules: 90-100% coverage** (all converters, plugin, parf, backend, tools, types, report) - TARGET MET
-**Mutation testing** validates test quality (100% on converters)
-**Zero defects tolerance** via [Certeza]https://github.com/paiml/certeza validation
-**Performance benchmarks** (sub-nanosecond backend selection)
-**Security audits** (0 vulnerabilities)

**Coverage Breakdown:**
- Config module: **100%** coverage
- Analyzer module: **82.76%** coverage
- Types module: **~95%** coverage
- Report module: **~95%** coverage
- Backend module: **~95%** coverage
- Tools module: **~95%** coverage
- ML Converters (NumPy, sklearn, PyTorch): **~90-95%** coverage
- Plugin architecture: **~90%** coverage
- PARF analyzer: **~90%** coverage
- CLI (main.rs): **0%** unit (covered by 36 integration tests)

**Quality Validation:**
```bash
# Run certeza quality checks before committing
cd ../certeza && cargo run -- check ../Batuta
```

See [IMPLEMENTATION.md](IMPLEMENTATION.md#quality-validation-with-certeza) for full quality metrics and improvement plans.

---

Batuta orchestrates the **20-component Sovereign AI Stack** to enable **semantic-preserving** conversion of legacy codebases to high-performance Rust, complete with GPU acceleration, SIMD optimization, and ML inference capabilities.

## 🚀 Quick Start

```bash
# Install Batuta
cargo install batuta

# Analyze your project
batuta analyze --languages --dependencies --tdg

# Convert to Rust (coming soon)
batuta transpile --incremental --cache

# Optimize with GPU/SIMD (coming soon)
batuta optimize --enable-gpu --profile aggressive

# Validate equivalence (coming soon)
batuta validate --trace-syscalls --benchmark

# Build final binary (coming soon)
batuta build --release
```

## 📖 Documentation

**[Read The Batuta Book](https://paiml.github.io/Batuta/)** - Comprehensive guide covering:
- Philosophy and core principles (Toyota Way applied to code migration)
- The 5-phase workflow (Analysis → Transpilation → Optimization → Validation → Deployment)
- Tool ecosystem deep-dives (all 20 Sovereign AI Stack components)
- 50+ peer-reviewed academic references across specifications
- Practical examples and case studies

## 🎯 What is Batuta?

Batuta is named after the **conductor's baton** – it orchestrates multiple specialized tools to convert legacy code to Rust while maintaining semantic equivalence. Unlike simple transpilers, Batuta:

- **Preserves semantics** through IR-based analysis and validation
- **Optimizes automatically** with SIMD/GPU acceleration via Trueno
- **Provides gradual migration** through Ruchy scripting language
- **Applies Toyota Way principles** (Muda, Jidoka, Kaizen) for quality

## 🧩 Sovereign AI Stack

Batuta orchestrates **20 components** across 7 layers:

### Transpilers (L3)
- **[Depyler]https://github.com/paiml/depyler** - Python → Rust with type inference
- **[Decy]https://github.com/paiml/decy** - C/C++ → Rust with ownership inference
- **[Bashrs]https://github.com/paiml/bashrs** v6.41.0 - Rust → Shell (bootstrap scripts)
- **[Ruchy]https://github.com/paiml/ruchy** v3.213.0 - Script → Rust (systems scripting)

### Foundation Libraries (L0-L2)
- **[Trueno]https://github.com/paiml/trueno** v0.7.3 - SIMD/GPU compute primitives, zero-copy
- **[Trueno-DB]https://github.com/paiml/trueno-db** v0.3.3 - Vector database with HNSW indexing
- **[Trueno-Graph]https://github.com/paiml/trueno-graph** v0.1.1 - Graph analytics & lineage DAG
- **[Trueno-RAG]https://github.com/paiml/trueno-rag** - RAG: BM25+dense hybrid, RRF fusion, cross-encoder reranking ([10 papers]https://github.com/paiml/trueno-rag/blob/main/docs/specifications/rag-pipeline-spec.md)
- **[Aprender]https://github.com/paiml/aprender** v0.12.0 - First-principles ML, .apr encryption
- **[Realizar]https://github.com/paiml/realizar** - LLM inference: GGUF, safetensors, KV-cache

### Quality & Orchestration (L4-L5)
- **[Repartir]https://github.com/paiml/repartir** v1.0.0 - Distributed computing
- **[pforge]https://github.com/paiml/pforge** v0.1.2 - Zero-boilerplate MCP server framework
- **[Certeza]https://github.com/paiml/certeza** - Quality validation framework
- **[PMAT]https://github.com/paiml/paiml-mcp-agent-toolkit** v2.205.0 - AI context generation & code quality
- **[Renacer]https://github.com/paiml/renacer** v0.6.5 - Syscall tracing & golden traces

### Data & MLOps (L6)
- **[Alimentar]https://github.com/paiml/alimentar** - Data loading with .ald AES-256-GCM encryption
- **[Pacha]https://github.com/paiml/pacha** - Model/Data/Recipe Registry: BLAKE3 deduplication, Model Cards, Datasheets, W3C PROV-DM lineage ([20 papers]https://github.com/paiml/pacha/blob/main/docs/specifications/model-data-recipe-spec.md)

## 🔮 Oracle Mode

Query the Sovereign AI Stack with natural language:

```bash
# Find the right component for your task
batuta oracle "How do I train random forest on 1M samples?"

# List all stack components
batuta oracle --list

# Show component details
batuta oracle --show aprender

# Interactive mode
batuta oracle --interactive
```

Oracle Mode uses **Amdahl's Law** and the **PCIe 5× Rule** (Gregg & Hazelwood, 2011) to recommend optimal backends (Scalar/SIMD/GPU/Distributed).

## ✍️ Content Creation Tooling

Generate structured prompts for educational content with Toyota Way quality constraints:

```bash
# List available content types
batuta content types

# Generate book chapter prompt
batuta content emit --type bch --title "Error Handling in Rust" --audience "Python developers"

# Generate high-level outline
batuta content emit --type hlo --title "ML Course" --show-budget

# Validate content against quality gates
batuta content validate --type bch chapter.md
```

**Content Types:**
- **HLO** - High-Level Outline (YAML/Markdown, 50-200 lines)
- **DLO** - Detailed Outline (YAML/Markdown, 200-1000 lines)
- **BCH** - Book Chapter (Markdown/mdBook, 2000-8000 words)
- **BLP** - Blog Post (Markdown + TOML, 500-3000 words)
- **PDM** - Presentar Demo (HTML + YAML)

**Quality Gates (Jidoka):**
- Meta-commentary detection ("In this chapter, we will...")
- Code block language validation
- Heading hierarchy enforcement
- Token budget management (Heijunka)

## 📊 Commands

### `batuta analyze`

Analyze your project to understand languages, dependencies, and code quality.

```bash
# Full analysis
batuta analyze --languages --dependencies --tdg

# Just detect languages
batuta analyze --languages

# Calculate TDG score only
batuta analyze --tdg
```

**Output includes:**
- Language breakdown with line counts and percentages
- Primary language detection
- Transpiler recommendations
- Dependency manager detection (pip, Cargo, npm, etc.)
- Package counts per dependency file
- TDG quality score (0-100) with letter grade
- ML framework detection
- Next steps guidance

### `batuta init` (Coming Soon)

Initialize a Batuta project and set up conversion configuration.

```bash
batuta init --source ./my-python-app --output ./my-rust-app
```

### `batuta transpile` (Coming Soon)

Convert source code to Rust with incremental compilation and caching.

```bash
# Basic transpilation
batuta transpile

# Incremental mode with caching
batuta transpile --incremental --cache

# Specific modules only
batuta transpile --modules auth,api,db

# Generate Ruchy for gradual migration
batuta transpile --ruchy --repl
```

### `batuta optimize` (Coming Soon)

Apply performance optimizations with GPU/SIMD acceleration.

```bash
# Balanced optimization (default)
batuta optimize

# Aggressive optimization
batuta optimize --profile aggressive --enable-gpu

# Custom GPU threshold
batuta optimize --enable-gpu --gpu-threshold 1000
```

**Optimization profiles:**
- `fast` - Quick compilation, basic optimizations
- `balanced` - Default, good compilation/performance trade-off
- `aggressive` - Maximum performance, slower compilation

### `batuta validate` (Coming Soon)

Verify semantic equivalence between original and transpiled code.

```bash
# Full validation suite
batuta validate --trace-syscalls --diff-output --run-original-tests --benchmark

# Quick syscall validation
batuta validate --trace-syscalls
```

### `batuta build` (Coming Soon)

Build optimized Rust binaries with cross-compilation support.

```bash
# Release build
batuta build --release

# Cross-compile
batuta build --target x86_64-unknown-linux-musl

# WebAssembly
batuta build --wasm
```

### `batuta report` (Coming Soon)

Generate comprehensive migration reports.

```bash
# HTML report (default)
batuta report

# Markdown for documentation
batuta report --format markdown --output MIGRATION.md

# JSON for CI/CD
batuta report --format json --output report.json
```

## 🏗️ 5-Phase Workflow

Batuta implements a **5-phase Kanban workflow** based on Toyota Way principles:

### Phase 1: Analysis
- Detect project languages and structure
- Calculate technical debt grade (TDG)
- Identify dependencies and frameworks
- Recommend transpilation strategy

### Phase 2: Transpilation
- Convert code to Rust/Ruchy using appropriate transpiler
- Preserve semantics through IR analysis
- Generate human-readable output
- Support incremental compilation

### Phase 3: Optimization
- Apply SIMD vectorization (via Trueno)
- Enable GPU acceleration for compute-heavy code
- Optimize memory layout
- Select backends via Mixture-of-Experts routing

### Phase 4: Validation
- Trace syscalls to verify equivalence (via Renacer)
- Run original test suite
- Compare outputs and performance
- Generate diff reports

### Phase 5: Deployment
- Build optimized binaries
- Cross-compile for target platforms
- Package for distribution
- Generate migration documentation

## 🎓 Toyota Way Principles

Batuta applies **Lean Manufacturing** principles to code migration:

### Muda (Waste Elimination)
- **StaticFixer integration** - Eliminate duplicate static analysis (~40% reduction)
- **PMAT adaptive analysis** - Focus on critical code, skip boilerplate
- **Decy diagnostics** - Clear, actionable error messages reduce confusion

### Jidoka (Built-in Quality)
- **Ruchy strictness levels** - Gradual quality at migration boundaries
- **Pipeline validation** - Quality checks at each phase
- **Semantic equivalence** - Automated verification via syscall tracing

### Kaizen (Continuous Improvement)
- **MoE optimization** - Continuous performance tuning
- **Incremental features** - Deliver value progressively
- **Feedback loops** - Learn from each migration

### Heijunka (Level Scheduling)
- **Batuta orchestrator** - Balanced load across transpilers
- **Parallel processing** - Efficient resource utilization

### Kanban (Visual Workflow)
- **5-phase tracking** - Clear stage visibility
- **Dependency management** - Automatic task ordering

### Andon (Problem Visualization)
- **Renacer integration** - Runtime behavior analysis
- **TDG scoring** - Quality visibility

## 📚 Academic Foundation

Every specification cites peer-reviewed research (50+ papers total):

| Component | Papers | Key Citations |
|-----------|--------|---------------|
| **Pacha** | 20 | Model Cards [Mitchell 2019], Datasheets [Gebru 2021], PROV-DM [W3C] |
| **Trueno-RAG** | 10 | RAG [Lewis 2020], DPR [Karpukhin 2020], BM25 [Robertson 2009] |
| **Trueno-DB** | HNSW | [Malkov 2020] IEEE TPAMI |

This isn't marketing—it's engineering rigor applied to every design decision.

## 📈 Example: Python ML Project

```bash
# 1. Analyze the project
$ batuta analyze --languages --dependencies --tdg

📊 Analysis Results
==================================================
Primary language: Python
Total files: 127
Total lines: 8,432

Dependencies:
  • pip (42 packages)
    File: "./requirements.txt"
  • ℹ ML frameworks detected - consider Aprender/Realizar for ML code

Quality Score:
  • TDG Score: 73.2/100 (B)

Recommended transpiler: Depyler (Python → Rust)

# 2. Transpile to Rust (coming soon)
$ batuta transpile --incremental

🔄 Transpiling with Depyler...
  ✓ Converted 127 files (3,891 warnings, 42 errors addressed)
  ✓ NumPy → Trueno: 23 operations
  ✓ sklearn → Aprender: 5 models
  ✓ PyTorch → Realizar: 2 inference pipelines

# 3. Optimize (coming soon)
$ batuta optimize --enable-gpu --profile aggressive

⚡ Optimizing...
  ✓ SIMD vectorization: 234 loops optimized
  ✓ GPU dispatch: 12 operations (threshold: 500 elements)
  ✓ Memory layout: 18 structs optimized

# 4. Validate (coming soon)
$ batuta validate --trace-syscalls --benchmark

✅ Validation passed!
  ✓ Syscall equivalence: 100%
  ✓ Output identical: ✓
  ✓ Performance: 4.2x faster, 62% less memory
```

## 🛠️ Development Status

**Current Version:** 0.1.2 (Alpha)

- **Phase 1: Analysis** - Complete
  - ✅ Language detection
  - ✅ Dependency analysis
  - ✅ TDG scoring
  - ✅ Transpiler recommendations

- 🚧 **Phase 2: Core Orchestration** - In Progress
  - ⏳ CLI scaffolding (complete)
  - ⏳ Transpilation engine
  - ⏳ 5-phase workflow
  - ⏳ PMAT integration

- 📋 **Phase 3: Advanced Pipelines** - Planned
  - 📋 NumPy → Trueno
  - 📋 sklearn → Aprender
  - 📋 PyTorch → Realizar

- 📋 **Phase 4: Enterprise Features** - Future
  - 📋 Renacer tracing
  - 📋 PARF reference finder

See [roadmap.yaml](docs/roadmaps/roadmap.yaml) for complete ticket breakdown (12 tickets, 572 hours).

## 📖 Documentation

- [Specification]docs/specifications/batuta-orchestration-decy-depyler-trueno-aprender-realizar-ruchy-spec.md - Complete technical specification
- [Roadmap]docs/roadmaps/roadmap.yaml - PMAT-tracked development roadmap
- [PMAT Bug Report]PMAT_BUG_REPORT.md - Known issues with PMAT workflow

## 🤝 Contributing

Batuta is part of the [Pragmatic AI Labs](https://github.com/paiml) ecosystem. Contributions are welcome!

```bash
# Clone and build
git clone https://github.com/paiml/Batuta.git
cd Batuta
cargo build --release

# Run tests
cargo test

# Install locally
cargo install --path .
```

## 📄 License

MIT License - see [LICENSE](LICENSE) for details.

## 🔗 Related Projects

**Transpilers:**
- [Depyler]https://github.com/paiml/depyler - Python → Rust with type inference
- [Decy]https://github.com/paiml/decy - C/C++ → Rust with ownership inference

**Compute & AI:**
- [Trueno]https://github.com/paiml/trueno - SIMD/GPU compute primitives
- [Trueno-RAG]https://github.com/paiml/trueno-rag - RAG pipeline (10 peer-reviewed papers)
- [Realizar]https://github.com/paiml/realizar - LLM inference (GGUF, safetensors)

**MLOps & Quality:**
- [Pacha]https://github.com/paiml/pacha - Model/Data/Recipe registry (20 peer-reviewed papers)
- [PMAT]https://github.com/paiml/paiml-mcp-agent-toolkit - AI context & code quality
- [Renacer]https://github.com/paiml/renacer - Syscall tracing & golden traces

## 🙏 Acknowledgments

Batuta applies principles from:
- **Toyota Production System** - Muda, Jidoka, Kaizen, Heijunka, Kanban, Andon
- **Lean Software Development** - Value stream optimization
- **First Principles Thinking** - Rebuild from fundamental truths

---

**Batuta** - Because every great orchestra needs a conductor. 🎵