<div align="center">
# Culebra
**/koo-LEH-brah/**
**Compiler diagnostics for self-hosting languages that target LLVM.**
*ABI. IR. Binary. Bootstrap. One binary catches what no debugger will.*
Born from [Mapanare](https://github.com/Mapanare-Research/Mapanare)'s bootstrap, where every bug was a mystery with no safety net. Culebra ships a Nuclei-style template engine so every compiler bug you survive becomes a pattern nobody else has to debug.
<br>



[](LICENSE)
[](Cargo.toml)
[]()
[](https://github.com/Mapanare-Research/Culebra/stargazers)
<br>
[Why Culebra?](#why-culebra) · [Install](#install) · [Quick Start](#quick-start) · [Template Engine](#template-engine) · [All Commands](#all-commands) · [Shipped Templates](#shipped-templates) · [Configuration](#configuration-culebratoml) · [Architecture](#architecture) · [Full Docs](docs.md) · [Contributing](#contributing)
</div>
---
## Why Culebra?
Most languages bootstrapped on top of a mature compiler:
- **Rust** started in OCaml before self-hosting about a year later.
- **Go** was written in C until v1.5, then used an automated C-to-Go translator.
- **C++** bootstrapped through Cfront, which translated C++ to C and let C compilers handle code generation.
[Mapanare](https://github.com/Mapanare-Research/Mapanare) doesn't have that luxury. It's an AI-native compiled language targeting LLVM IR, building its own backend from scratch: lexer, AST, type inference, LLVM IR emission. The bootstrap compiler (Stage 0) is written in Python, but there's no mature compiler underneath to fall back on.
That means every ABI mismatch, every string byte-count error, every struct layout divergence between IR and C, every bootstrap stage regression hits directly with no safety net.
**Culebra is the safety net.**
It exists because Mapanare needed it to survive its own bootstrap. It turns out every compiler project that targets LLVM needs the same thing, but nobody packaged it before.
We didn't just build a linter. We built a pattern engine. Every compiler bug we survived became a template so nobody else has to.
> The name: *Mapanare* is a Venezuelan pit viper. *Culebra* is the common snake. Same family, different role. Mapanare is the language, Culebra is the utility tool any compiler developer can pick up.
---
## Install
### Linux / macOS
```bash
cargo install --git https://github.com/Mapanare-Research/Culebra
```
### Windows
```powershell
cargo install --git https://github.com/Mapanare-Research/Culebra
```
### From source
```bash
git clone https://github.com/Mapanare-Research/Culebra.git
cd Culebra
cargo build --release
# Binary at target/release/culebra
```
Verify:
```bash
culebra --version
```
---
## Quick Start
You just emitted `stage2.ll` from your compiler and something is wrong at runtime. Here's how you hunt it down:
```bash
# 1. Scan for all known bug patterns at once
culebra scan stage2.ll
# 2. Focus on critical ABI bugs only
culebra scan stage2.ll --tags abi --severity critical
# 3. Auto-fix what can be fixed
culebra scan stage2.ll --autofix --dry-run # preview
culebra scan stage2.ll --autofix # apply
# 4. Is the IR even valid?
culebra check stage2.ll
# 5. Are string constants correct?
culebra strings stage2.ll
# 6. Any known pathologies?
culebra audit stage2.ll
# 7. What changed between stage1 and stage2?
culebra diff stage1.ll stage2.ll
# 8. Drill into one function
culebra extract stage2.ll my_broken_function
# 9. Cross-reference struct layouts against C runtime
culebra abi stage2.ll --header runtime/mapanare_core.c
# 10. Inspect the compiled binary's .rodata
culebra binary ./my_compiler --ir stage2.ll --find "hello world"
# 11. Run the full pipeline end-to-end
culebra pipeline
```
---
## Real bugs Culebra catches
These are real bugs from Mapanare's bootstrap. Every one wasted hours of debugging.
### Unaligned string constant (the bootstrap killer)
String constants without `align 2` land at odd addresses. Pointer tagging shifts the pointer by -1 byte. Every string comparison fails silently. Tokenizer produces 0 tokens. Compiler outputs empty IR. No crash, no error.
```bash
$ culebra scan stage2.ll --id unaligned-string-constant
CRITICAL [unaligned-string-constant] String constant missing alignment -- stage2.ll:47
@.str.0 is a 6-byte string constant without alignment.
If your runtime uses pointer bit-tagging, this constant's
pointer will be silently shifted by -1 byte.
fix: Add ', align 2' to all [N x i8] constant declarations
```
### String byte-count mismatch
Your escape-sequence handler emits `\n` as two bytes instead of one but the `[N x i8]` type says `N`. The IR assembles, the binary links, and the string silently contains garbage.
```bash
$ culebra strings stage2.ll
ERROR: @.str.47 declared [14 x i8] but content is 13 bytes
-> c"Hello, world!\00"
Fix: change to [13 x i8]
```
### List push without writeback (alias analysis trap)
Pushing to a list via GEP directly into a struct field. LLVM caches the pre-push struct state. The mutation is lost. Stage 1 works, stage 2 accumulates 0 lines.
```bash
$ culebra scan stage2.ll --id direct-push-no-writeback
HIGH [direct-push-no-writeback] List push without temp writeback -- stage2.ll:142 (in emit_line)
List push at struct field 2 goes directly through GEP without
temp alloca + writeback. LLVM may optimize away the mutation at -O1+.
```
### ABI struct layout mismatch
Your IR passes a struct by value. The C runtime expects it via `sret` pointer. It compiles, links, and segfaults at runtime.
```bash
$ culebra abi stage2.ll --header runtime/mapanare_core.c
WARNING: @create_string returns {i8*, i64} (16 bytes) by value
-> Consider sret for structs > 8 bytes on x86_64
```
### Bootstrap stage divergence
Stage 2 and Stage 3 should produce identical output (fixed-point). They don't, and you can't tell where the divergence started.
```bash
$ culebra diff stage2.ll stage3.ll
DIVERGED: 3 functions differ
-> @emit_call: 47 vs 52 instructions
-> @type_check: register allocation differs
-> @codegen_if: missing branch in stage3
```
---
## Template Engine
Culebra ships a Nuclei-style pattern engine. Bug patterns are YAML templates. The Rust binary is the engine. The templates are the knowledge base.
Every template in the initial pack comes from a real bug hit during Mapanare's bootstrap. Not hypothetical patterns -- documented battlefield scars with commit references, impact descriptions, and proven remediations.
### Scan
```bash
# Run all templates
culebra scan file.ll
# Filter by tag, severity, or specific template
culebra scan file.ll --tags abi,string
culebra scan file.ll --severity critical,high
culebra scan file.ll --id unaligned-string-constant
# Cross-file ABI check
culebra scan file.ll --header runtime.c
# Auto-fix
culebra scan file.ll --autofix --dry-run
culebra scan file.ll --autofix
# Custom template
culebra scan file.ll --template ./my-check.yaml
# Output formats
culebra scan file.ll --format json
culebra scan file.ll --format sarif # GitHub Code Scanning
culebra scan file.ll --format markdown # CI reports
```
### Browse templates
```bash
culebra templates list
culebra templates list --tags abi
culebra templates show unaligned-string-constant
```
### Run workflows
Workflows chain templates with stop conditions for multi-step validation:
```bash
culebra workflow bootstrap-health-check \
--input stage1_output=stage1.ll
culebra workflow pre-commit \
--input ir_file=main.ll
culebra workflow ci-full \
--input ir_file=main.ll --format sarif
```
### Write your own templates
Templates are YAML files in `culebra-templates/`. A minimal example:
```yaml
id: my-custom-check
info:
name: My custom check
severity: high
author: yourname
description: Catches a specific bug pattern.
tags:
- ir
- custom
scope:
file_type: llvm-ir
section: functions
match:
matchers:
- type: regex
name: pattern_name
pattern:
- 'some regex pattern'
condition: or
remediation:
suggestion: "How to fix this"
```
Anyone building a language targeting LLVM can open a PR adding their own bug template. The engine never changes, the knowledge base grows. Same model that made Nuclei dominant in security scanning.
See [docs.md](docs.md) for the full template specification, matcher types (regex, sequence, cross-reference, byte scanner), extractors, autofix, and workflow definitions.
---
## Shipped Templates
29 templates across 4 categories, every one from a real Mapanare bug.
| **ABI** | `unaligned-string-constant` | Critical | String constants at odd addresses corrupt pointer tagging |
| **ABI** | `struct-layout-mismatch` | Critical | IR struct vs C header field count/type divergence |
| **ABI** | `return-type-divergence` | Critical | Runtime function return type differs between stages (e.g., ptr vs {i64, i64}) |
| **ABI** | `direct-push-no-writeback` | High | List push through GEP without temp alloca writeback |
| **ABI** | `sret-input-output-alias` | High | sret pointer aliasing input corrupts data mid-computation |
| **ABI** | `tagged-pointer-odd-address` | High | Odd-sized constants without alignment break pointer tagging |
| **ABI** | `missing-byval-large-struct` | Medium | Large structs passed as bare ptr without byval |
| **ABI** | `large-struct-by-value` | High | Structs >56 bytes passed by value via load/store instead of sret/memcpy |
| **ABI** | `list-element-size-undercount` | High | `__mn_list_new(N)` with N smaller than actual element struct |
| **IR** | `empty-switch-body` | Critical | Switch with 0 cases -- match arms not generated |
| **IR** | `break-inside-nested-control` | Critical | Break inside if-inside-for dropped — infinite loop |
| **IR** | `option-type-pun-zeroinit` | Critical | Option discriminant clobbered by inner type store over zeroinitializer |
| **IR** | `ret-type-mismatch` | Critical | Return type doesn't match function signature |
| **IR** | `byte-count-mismatch` | High | `[N x i8]` declared size vs actual content differs |
| **IR** | `phi-predecessor-mismatch` | High | PHI node references non-existent predecessor block |
| **IR** | `internal-linkage-dce` | High | Internal-linkage functions stripped by LLVM -O1 optimizer |
| **IR** | `dynamic-alloca-non-entry` | High | Allocas in non-entry blocks misalign RSP, crash libc SSE calls |
| **IR** | `return-inside-nested-block` | High | Return inside match/if doesn't terminate -- execution falls through |
| **IR** | `phi-operand-type-mismatch` | High | PHI operand type differs from declared type (dead if_result PHIs) |
| **IR** | `raw-control-byte-in-constant` | Medium | Raw control bytes in c"..." break line-based tooling |
| **IR** | `unreachable-after-branch` | Medium | Instructions after terminator (dead code) |
| **IR** | `dropped-else-branch` | Medium | if_then without corresponding else block -- branch silently dropped |
| **Binary** | `missing-symbol` | Critical | Runtime symbol missing from binary symbol table |
| **Binary** | `odd-address-rodata` | High | String at odd address in .rodata section |
| **Bootstrap** | `function-count-drop` | Critical | Stage N+1 has fewer functions than Stage N |
| **Bootstrap** | `stage-output-divergence` | High | Stage output doesn't converge toward fixed-point |
| **Bootstrap** | `fixed-point-delta` | High | Compiler output doesn't stabilize after N iterations |
| **Bootstrap** | `call-count-divergence` | High | Function calls runtime helper fewer times than stage1 (branches dropped) |
| **Bootstrap** | `body-size-shrinkage` | High | Function body drastically smaller in self-compiled output |
4 shipped workflows: `bootstrap-health-check`, `pre-commit`, `ci-full`, `playground-mapanare`.
---
## All Commands
| `culebra scan file.ll` | Scan IR with YAML pattern templates. `--tags`, `--severity`, `--id`, `--format`, `--autofix`. |
| `culebra templates list` | List all available scan templates with severity and tags. |
| `culebra templates show <id>` | Show full details of a template: description, impact, remediation, CWE. |
| `culebra workflow <id>` | Run a multi-step scan workflow with stop conditions. |
| `culebra strings file.ll` | Validate `[N x i8] c"..."` byte counts. Catches escape-sequence miscounting. |
| `culebra audit file.ll` | Detect IR pathologies: empty switch, ret mismatch, missing `%`, duplicate case. |
| `culebra check file.ll` | Validate IR with `llvm-as`. |
| `culebra phi-check file.ll` | Validate transform scripts preserve IR structure. |
| `culebra diff a.ll b.ll` | Per-function structural diff, register-normalized. |
| `culebra extract file.ll fn` | Extract a single function from a massive IR file. |
| `culebra table file.ll` | Per-function metrics table (instructions, allocas, calls, etc.). |
| `culebra abi file.ll` | Detect sret/byref misuse, struct layout validation, C header cross-ref. |
| `culebra binary ./binary` | ELF/PE inspection, .rodata analysis, IR cross-referencing. |
| `culebra run compiler source` | Compile, run, check expected output. |
| `culebra test` | Run all `[[tests]]` from `culebra.toml`. |
| `culebra watch` | Watch files, re-run a command on change. |
| `culebra pipeline` | Run full stage pipeline end-to-end via `culebra.toml`. |
| `culebra fixedpoint compiler source` | Detect fixed-point convergence in self-hosting compilers. |
| `culebra status` | Show bootstrap self-hosting progress. |
| `culebra init` | Generate a `culebra.toml` template. |
---
## Layers
| **Scan** | `scan`, `templates`, `workflow` | Template-driven pattern matching, autofix, SARIF output |
| **IR** | `strings`, `audit`, `check`, `diff`, `extract`, `table` | Byte-level IR validation, pathology detection, structural comparison |
| **ABI** | `abi` | Calling convention mismatches, sret/byref analysis, struct layout |
| **Binary** | `binary` | ELF/PE inspection, .rodata cross-referencing against IR |
| **Pipeline** | `phi-check`, `pipeline`, `fixedpoint` | Transform validation, stage orchestration, convergence detection |
| **Runtime** | `run`, `test` | Compile-and-run, expected-output diffing |
| **Bootstrap** | `status` | Self-hosting progress tracking |
| **Config** | `init`, `watch` | Project setup, file-watching |
---
## Architecture
```
culebra scan file.ll --tags abi
|
+---------------+---------------+
| |
Template Loader IR Parser
(culebra-templates/) (ir.rs -> IRModule)
| |
Filter by tags, Parse functions, globals,
severity, id string constants, structs
| |
+----------- Engine ------------+
|
+---------------+---------------+
| | |
Regex Matcher Sequence Matcher Cross-Ref Matcher
(single-line) (multi-line with (IR vs C header)
captures, absence)
| | |
+----------- Findings ----------+
|
+---------------+---------------+
| | |
Text JSON SARIF
(colored) (structured) (GitHub Code
Scanning)
```
**Template directory resolution:**
1. `./culebra-templates/` (project-local)
2. `<binary_dir>/culebra-templates/` (next to binary)
3. `~/.culebra/templates/` (user-global)
**Template directory structure:**
```
culebra-templates/
abi/
unaligned-string-constant.yaml
direct-push-no-writeback.yaml
sret-input-output-alias.yaml
missing-byval-large-struct.yaml
tagged-pointer-odd-address.yaml
struct-layout-mismatch.yaml
return-type-divergence.yaml # NEW — playground
large-struct-by-value.yaml # NEW — playground
list-element-size-undercount.yaml # NEW — playground
ir/
byte-count-mismatch.yaml
empty-switch-body.yaml
ret-type-mismatch.yaml
raw-control-byte.yaml
phi-predecessor-mismatch.yaml
unreachable-after-branch.yaml
dropped-else-branch.yaml # NEW — playground
option-type-pun-zeroinit.yaml # NEW — playground
internal-linkage-dce.yaml # NEW — playground
dynamic-alloca-non-entry.yaml # NEW — playground
return-inside-nested-block.yaml # NEW — playground
phi-operand-type-mismatch.yaml # NEW — playground
break-inside-nested-control.yaml # NEW — playground
binary/
odd-address-rodata.yaml
missing-symbol.yaml
bootstrap/
stage-output-divergence.yaml
function-count-drop.yaml
fixed-point-delta.yaml
call-count-divergence.yaml # NEW — v2.2.0 playground
body-size-shrinkage.yaml # NEW — v2.2.0 playground
workflows/
bootstrap-health-check.yaml
pre-commit.yaml
ci-full.yaml
playground-mapanare.yaml # NEW — v2.2.0 playground
```
---
## Configuration: `culebra.toml`
Run `culebra init` to generate a starter config:
```toml
[project]
name = "my-compiler"
source_lang = "my-lang"
target = "llvm"
compiler = "./my-compiler"
runtime = "runtime/my_runtime.c"
[[stages]]
name = "bootstrap"
cmd = "python bootstrap/compile.py {input}"
input = "src/compiler.ml"
output = "/tmp/stage1.ll"
validate = true
[[stages]]
name = "stage1"
cmd = "{prev_output} {input}"
input = "src/compiler.ml"
output = "/tmp/stage2.ll"
validate = true
[[stages]]
name = "stage2"
cmd = "{prev_output} {input}"
input = "src/compiler.ml"
output = "/tmp/stage3.ll"
validate = true
[[tests]]
name = "hello"
source = 'fn main() { print("hello") }'
expect = "hello"
[[tests]]
name = "math"
source = "fn main() { print(2 + 3) }"
expect = "5"
```
---
## CI/CD Integration
### GitHub Actions with SARIF
```yaml
name: Culebra Scan
on: [push, pull_request]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Culebra
run: cargo install --git https://github.com/Mapanare-Research/Culebra
- name: Run scan
run: culebra scan output.ll --format sarif > culebra.sarif
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: culebra.sarif
```
### Pre-commit hook
```bash
#!/bin/bash
culebra scan output.ll --severity critical,high
```
---
## Built for
- Anyone building a language that targets LLVM IR
- Anyone self-hosting a compiler
- Anyone debugging ABI and calling convention issues between IR and native code
- Anyone running a multi-stage bootstrap and needing to know where divergence starts
- Anyone who's lost hours to a string byte-count being off by one
- Anyone who wants their hard-won compiler bugs turned into reusable detection templates
---
## Contributing
Contributions welcome. Two ways to contribute:
1. **Code** -- Rust engine improvements, new matcher types, output formats
2. **Templates** -- Add YAML templates for compiler bugs you've encountered
Every bug you've hit with your LLVM-targeting compiler can become a template. The tool gets smarter without touching Rust code.
---
## License
MIT License -- see [LICENSE](LICENSE) for details.
---
<div align="center">
**Culebra** -- The safety net your compiler needs.
[Full Documentation](docs.md) · [Report Bug](https://github.com/Mapanare-Research/Culebra/issues) · [Mapanare](https://github.com/Mapanare-Research/Mapanare)
Made with care by [Juan Denis](https://juandenis.com)
</div>