Culebra
/koo-LEH-brah/
Compiler diagnostics for self-hosting languages that target LLVM.
ABI. IR. Binary. Bootstrap. One binary catches what no debugger will.
Born from Mapanare's bootstrap, where every bug was a mystery with no safety net. Culebra ships a Nuclei-style template engine so every compiler bug you survive becomes a pattern nobody else has to debug.
English | Espanol | 中文版 | Portugues
Why Culebra? · Install · Quick Start · Template Engine · All Commands · Shipped Templates · Configuration · Architecture · Full Docs · Contributing
Why Culebra?
Most languages bootstrapped on top of a mature compiler:
- Rust started in OCaml before self-hosting about a year later.
- Go was written in C until v1.5, then used an automated C-to-Go translator.
- C++ bootstrapped through Cfront, which translated C++ to C and let C compilers handle code generation.
Mapanare doesn't have that luxury. It's an AI-native compiled language targeting LLVM IR, building its own backend from scratch: lexer, AST, type inference, LLVM IR emission. The bootstrap compiler (Stage 0) is written in Python, but there's no mature compiler underneath to fall back on.
That means every ABI mismatch, every string byte-count error, every struct layout divergence between IR and C, every bootstrap stage regression hits directly with no safety net.
Culebra is the safety net.
It exists because Mapanare needed it to survive its own bootstrap. It turns out every compiler project that targets LLVM needs the same thing, but nobody packaged it before.
We didn't just build a linter. We built a pattern engine. Every compiler bug we survived became a template so nobody else has to.
The name: Mapanare is a Venezuelan pit viper. Culebra is the common snake. Same family, different role. Mapanare is the language, Culebra is the utility tool any compiler developer can pick up.
Install
Linux / macOS
Windows
cargo install --git https://github.com/Mapanare-Research/Culebra
From source
# Binary at target/release/culebra
Verify:
Quick Start
You just emitted stage2.ll from your compiler and something is wrong at runtime. Here's how you hunt it down:
# 1. Scan for all known bug patterns at once
# 2. Focus on critical ABI bugs only
# 3. Auto-fix what can be fixed
# 4. Is the IR even valid?
# 5. Are string constants correct?
# 6. Any known pathologies?
# 7. What changed between stage1 and stage2?
# 8. Drill into one function
# 9. Cross-reference struct layouts against C runtime
# 10. Inspect the compiled binary's .rodata
# 11. Run the full pipeline end-to-end
Real bugs Culebra catches
These are real bugs from Mapanare's bootstrap. Every one wasted hours of debugging.
Unaligned string constant (the bootstrap killer)
String constants without align 2 land at odd addresses. Pointer tagging shifts the pointer by -1 byte. Every string comparison fails silently. Tokenizer produces 0 tokens. Compiler outputs empty IR. No crash, no error.
String byte-count mismatch
Your escape-sequence handler emits \n as two bytes instead of one but the [N x i8] type says N. The IR assembles, the binary links, and the string silently contains garbage.
List push without writeback (alias analysis trap)
Pushing to a list via GEP directly into a struct field. LLVM caches the pre-push struct state. The mutation is lost. Stage 1 works, stage 2 accumulates 0 lines.
)
ABI struct layout mismatch
Your IR passes a struct by value. The C runtime expects it via sret pointer. It compiles, links, and segfaults at runtime.
)
Bootstrap stage divergence
Stage 2 and Stage 3 should produce identical output (fixed-point). They don't, and you can't tell where the divergence started.
Template Engine
Culebra ships a Nuclei-style pattern engine. Bug patterns are YAML templates. The Rust binary is the engine. The templates are the knowledge base.
Every template in the initial pack comes from a real bug hit during Mapanare's bootstrap. Not hypothetical patterns -- documented battlefield scars with commit references, impact descriptions, and proven remediations.
Scan
# Run all templates
# Filter by tag, severity, or specific template
# Cross-file ABI check
# Auto-fix
# Custom template
# Output formats
Browse templates
Run workflows
Workflows chain templates with stop conditions for multi-step validation:
Write your own templates
Templates are YAML files in culebra-templates/. A minimal example:
id: my-custom-check
info:
name: My custom check
severity: high
author: yourname
description: Catches a specific bug pattern.
tags:
- ir
- custom
scope:
file_type: llvm-ir
section: functions
match:
matchers:
- type: regex
name: pattern_name
pattern:
- 'some regex pattern'
condition: or
remediation:
suggestion: "How to fix this"
Anyone building a language targeting LLVM can open a PR adding their own bug template. The engine never changes, the knowledge base grows. Same model that made Nuclei dominant in security scanning.
See docs.md for the full template specification, matcher types (regex, sequence, cross-reference, byte scanner), extractors, autofix, and workflow definitions.
Shipped Templates
29 templates across 4 categories, every one from a real Mapanare bug.
| Category | ID | Severity | What it catches |
|---|---|---|---|
| ABI | unaligned-string-constant |
Critical | String constants at odd addresses corrupt pointer tagging |
| ABI | struct-layout-mismatch |
Critical | IR struct vs C header field count/type divergence |
| ABI | return-type-divergence |
Critical | Runtime function return type differs between stages (e.g., ptr vs {i64, i64}) |
| ABI | direct-push-no-writeback |
High | List push through GEP without temp alloca writeback |
| ABI | sret-input-output-alias |
High | sret pointer aliasing input corrupts data mid-computation |
| ABI | tagged-pointer-odd-address |
High | Odd-sized constants without alignment break pointer tagging |
| ABI | missing-byval-large-struct |
Medium | Large structs passed as bare ptr without byval |
| ABI | large-struct-by-value |
High | Structs >56 bytes passed by value via load/store instead of sret/memcpy |
| ABI | list-element-size-undercount |
High | __mn_list_new(N) with N smaller than actual element struct |
| IR | empty-switch-body |
Critical | Switch with 0 cases -- match arms not generated |
| IR | break-inside-nested-control |
Critical | Break inside if-inside-for dropped — infinite loop |
| IR | option-type-pun-zeroinit |
Critical | Option discriminant clobbered by inner type store over zeroinitializer |
| IR | ret-type-mismatch |
Critical | Return type doesn't match function signature |
| IR | byte-count-mismatch |
High | [N x i8] declared size vs actual content differs |
| IR | phi-predecessor-mismatch |
High | PHI node references non-existent predecessor block |
| IR | internal-linkage-dce |
High | Internal-linkage functions stripped by LLVM -O1 optimizer |
| IR | dynamic-alloca-non-entry |
High | Allocas in non-entry blocks misalign RSP, crash libc SSE calls |
| IR | return-inside-nested-block |
High | Return inside match/if doesn't terminate -- execution falls through |
| IR | phi-operand-type-mismatch |
High | PHI operand type differs from declared type (dead if_result PHIs) |
| IR | raw-control-byte-in-constant |
Medium | Raw control bytes in c"..." break line-based tooling |
| IR | unreachable-after-branch |
Medium | Instructions after terminator (dead code) |
| IR | dropped-else-branch |
Medium | if_then without corresponding else block -- branch silently dropped |
| Binary | missing-symbol |
Critical | Runtime symbol missing from binary symbol table |
| Binary | odd-address-rodata |
High | String at odd address in .rodata section |
| Bootstrap | function-count-drop |
Critical | Stage N+1 has fewer functions than Stage N |
| Bootstrap | stage-output-divergence |
High | Stage output doesn't converge toward fixed-point |
| Bootstrap | fixed-point-delta |
High | Compiler output doesn't stabilize after N iterations |
| Bootstrap | call-count-divergence |
High | Function calls runtime helper fewer times than stage1 (branches dropped) |
| Bootstrap | body-size-shrinkage |
High | Function body drastically smaller in self-compiled output |
4 shipped workflows: bootstrap-health-check, pre-commit, ci-full, playground-mapanare.
All Commands
| Command | What it does |
|---|---|
culebra scan file.ll |
Scan IR with YAML pattern templates. --tags, --severity, --id, --format, --autofix. |
culebra templates list |
List all available scan templates with severity and tags. |
culebra templates show <id> |
Show full details of a template: description, impact, remediation, CWE. |
culebra workflow <id> |
Run a multi-step scan workflow with stop conditions. |
culebra strings file.ll |
Validate [N x i8] c"..." byte counts. Catches escape-sequence miscounting. |
culebra audit file.ll |
Detect IR pathologies: empty switch, ret mismatch, missing %, duplicate case. |
culebra check file.ll |
Validate IR with llvm-as. |
culebra phi-check file.ll |
Validate transform scripts preserve IR structure. |
culebra diff a.ll b.ll |
Per-function structural diff, register-normalized. |
culebra extract file.ll fn |
Extract a single function from a massive IR file. |
culebra table file.ll |
Per-function metrics table (instructions, allocas, calls, etc.). |
culebra abi file.ll |
Detect sret/byref misuse, struct layout validation, C header cross-ref. |
culebra binary ./binary |
ELF/PE inspection, .rodata analysis, IR cross-referencing. |
culebra run compiler source |
Compile, run, check expected output. |
culebra test |
Run all [[tests]] from culebra.toml. |
culebra watch |
Watch files, re-run a command on change. |
culebra pipeline |
Run full stage pipeline end-to-end via culebra.toml. |
culebra triage file.ll |
Group findings by root cause, deduplicate, show actionable summary. --format json for AI. |
culebra compare a.ll b.ll |
Per-function metric comparison. --metric calls/blocks/pushes, --threshold 0.3. |
culebra explain file.ll <id> |
Show matched IR in context with template description + remediation. --function <name>. |
culebra bisect a.ll b.ll |
Find divergent functions between stages, ranked by impact (callers * delta). |
culebra verify file.ll <id> |
Verify a specific fix — re-scan one template, PASS/FAIL output. --function <name>. |
culebra fixedpoint compiler source |
Detect fixed-point convergence in self-hosting compilers. |
culebra status |
Show bootstrap self-hosting progress. |
culebra init |
Generate a culebra.toml template. |
Layers
| Layer | Commands | What it covers |
|---|---|---|
| Scan | scan, templates, workflow |
Template-driven pattern matching, autofix, SARIF output |
| IR | strings, audit, check, diff, extract, table |
Byte-level IR validation, pathology detection, structural comparison |
| ABI | abi |
Calling convention mismatches, sret/byref analysis, struct layout |
| Binary | binary |
ELF/PE inspection, .rodata cross-referencing against IR |
| Pipeline | phi-check, pipeline, fixedpoint |
Transform validation, stage orchestration, convergence detection |
| Runtime | run, test |
Compile-and-run, expected-output diffing |
| Bootstrap | status |
Self-hosting progress tracking |
| Config | init, watch |
Project setup, file-watching |
Architecture
culebra scan file.ll --tags abi
|
+---------------+---------------+
| |
Template Loader IR Parser
(culebra-templates/) (ir.rs -> IRModule)
| |
Filter by tags, Parse functions, globals,
severity, id string constants, structs
| |
+----------- Engine ------------+
|
+---------------+---------------+
| | |
Regex Matcher Sequence Matcher Cross-Ref Matcher
(single-line) (multi-line with (IR vs C header)
captures, absence)
| | |
+----------- Findings ----------+
|
+---------------+---------------+
| | |
Text JSON SARIF
(colored) (structured) (GitHub Code
Scanning)
Template directory resolution:
./culebra-templates/(project-local)<binary_dir>/culebra-templates/(next to binary)~/.culebra/templates/(user-global)
Template directory structure:
culebra-templates/
abi/
unaligned-string-constant.yaml
direct-push-no-writeback.yaml
sret-input-output-alias.yaml
missing-byval-large-struct.yaml
tagged-pointer-odd-address.yaml
struct-layout-mismatch.yaml
return-type-divergence.yaml # NEW — playground
large-struct-by-value.yaml # NEW — playground
list-element-size-undercount.yaml # NEW — playground
ir/
byte-count-mismatch.yaml
empty-switch-body.yaml
ret-type-mismatch.yaml
raw-control-byte.yaml
phi-predecessor-mismatch.yaml
unreachable-after-branch.yaml
dropped-else-branch.yaml # NEW — playground
option-type-pun-zeroinit.yaml # NEW — playground
internal-linkage-dce.yaml # NEW — playground
dynamic-alloca-non-entry.yaml # NEW — playground
return-inside-nested-block.yaml # NEW — playground
phi-operand-type-mismatch.yaml # NEW — playground
break-inside-nested-control.yaml # NEW — playground
binary/
odd-address-rodata.yaml
missing-symbol.yaml
bootstrap/
stage-output-divergence.yaml
function-count-drop.yaml
fixed-point-delta.yaml
call-count-divergence.yaml # NEW — v2.2.0 playground
body-size-shrinkage.yaml # NEW — v2.2.0 playground
workflows/
bootstrap-health-check.yaml
pre-commit.yaml
ci-full.yaml
playground-mapanare.yaml # NEW — v2.2.0 playground
Configuration: culebra.toml
Run culebra init to generate a starter config:
[]
= "my-compiler"
= "my-lang"
= "llvm"
= "./my-compiler"
= "runtime/my_runtime.c"
[[]]
= "bootstrap"
= "python bootstrap/compile.py {input}"
= "src/compiler.ml"
= "/tmp/stage1.ll"
= true
[[]]
= "stage1"
= "{prev_output} {input}"
= "src/compiler.ml"
= "/tmp/stage2.ll"
= true
[[]]
= "stage2"
= "{prev_output} {input}"
= "src/compiler.ml"
= "/tmp/stage3.ll"
= true
[[]]
= "hello"
= 'fn main() { print("hello") }'
= "hello"
[[]]
= "math"
= "fn main() { print(2 + 3) }"
= "5"
CI/CD Integration
GitHub Actions with SARIF
name: Culebra Scan
on:
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Culebra
run: cargo install --git https://github.com/Mapanare-Research/Culebra
- name: Run scan
run: culebra scan output.ll --format sarif > culebra.sarif
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: culebra.sarif
Pre-commit hook
#!/bin/bash
Built for
- Anyone building a language that targets LLVM IR
- Anyone self-hosting a compiler
- Anyone debugging ABI and calling convention issues between IR and native code
- Anyone running a multi-stage bootstrap and needing to know where divergence starts
- Anyone who's lost hours to a string byte-count being off by one
- Anyone who wants their hard-won compiler bugs turned into reusable detection templates
Contributing
Contributions welcome. Two ways to contribute:
- Code -- Rust engine improvements, new matcher types, output formats
- Templates -- Add YAML templates for compiler bugs you've encountered
Every bug you've hit with your LLVM-targeting compiler can become a template. The tool gets smarter without touching Rust code.
License
MIT License -- see LICENSE for details.
Culebra -- The safety net your compiler needs.
Full Documentation · Report Bug · Mapanare
Made with care by Juan Denis