Skip to main content

relon_codegen_llvm/
lib.rs

1//! LLVM-backed AOT evaluator for Relon. **Phase B production envelope.**
2//!
3//! This crate is the second slice of the dual-backend strategy:
4//! Cranelift covers the default native AOT route and the LLVM AOT
5//! pipeline here chases Rust-native peak performance for the `#main`
6//! entry path.
7//!
8//! ## Scope (Phase B)
9//!
10//! - Two entry shapes accepted:
11//!   - **Legacy-i64** (`(I64...) -> I64`) for `from_ir_direct`
12//!     callers (tests, bench fixtures) — the Phase A bootstrap
13//!     envelope, retained for cross-backend comparison.
14//!   - **Buffer-protocol** (`(*state, i32, i32, i32, i32, i64) -> i32`)
15//!     for `from_source` callers. Matches the cranelift backend's
16//!     `EntryShape::BufferProtocol` so the runtime envelopes line up.
17//! - Source-driven pipeline (`from_source`): parse + analyze +
18//!   lower (`relon_ir::lower_workspace_single`) + LLVM emit + JIT
19//!   compile + per-call arena dispatch. The cmp_lua W1 / W2
20//!   workloads (list.sum(range(n)) / list.sum(range(n).map(...))) go
21//!   end-to-end through this path.
22//! - The op set started with what `lower_workspace_single` synthesised
23//!   for the W1 / W2 shape after the IR's `range_pipeline` peephole
24//!   collapsed `range.map.sum` into a single accumulator loop:
25//!   `LocalGet`, `ConstI64` / `ConstI32` / `ConstBool`, `LetGet` /
26//!   `LetSet`, `LoadField` / `StoreField` (scalar slots),
27//!   `Add` / `Sub` / `Mul` / `Div` / `Mod` / `BitAnd` (I32 + I64),
28//!   `Eq` / `Ne` / `Lt` / `Le` / `Gt` / `Ge`, structured control flow
29//!   (`Block` / `Loop` / `Br` / `BrIf` / `If`), and `Return`. The
30//!   emitter has since widened into strings, lists, pointer-indirect
31//!   fields, host calls, closures, object emission, wasm32 object
32//!   emission, and several stdlib surfaces; the tests are the
33//!   authoritative coverage map for those later slices.
34//!
35//! ## Safety contract
36//!
37//! The source-driven buffer-protocol path is the production entry and
38//! carries the backend sandbox contract: capability gates, div/mod
39//! guards, checked signed `Int` arithmetic, arena bounds checks before
40//! host-pointer formation, dynamic host-call trap lifting, and
41//! deterministic step-budget fuel all report through typed
42//! `RuntimeError`s. The legacy / typed-fast i64 entries remain for
43//! focused tests and benchmark kernels; they have no `ArenaState` error
44//! lane, and public `run_main` routes trap-capable bodies through the
45//! buffer entry.
46//!
47//! ## Remaining limits
48//!
49//! - **Full language surface** — tree-walk remains the complete
50//!   reference implementation. LLVM AOT covers the explicitly tested
51//!   compiled-backend surface and rejects shapes outside that envelope
52//!   loudly rather than fabricating partial results.
53//! - **`.o` / `.so` emit + dlopen** — Phase B still uses the
54//!   in-process MCJIT engine. The single-knob `OptimizationLevel`
55//!   API hides the engine choice so Phase C / ORC migration is a
56//!   localised diff.
57//!
58//! ## Decision log (Phase A.1)
59//!
60//! Picked `inkwell` over `llvm-sys` and external `clang`/`llc`:
61//!
62//! - `inkwell` 0.9.0 with the `llvm18-1` feature pins llvm-sys
63//!   181.3.0 against the system LLVM 18.1.3 install at
64//!   `/usr/lib/llvm-18`. Safe Rust wrappers eliminate the per-op
65//!   `unsafe` block the raw FFI path would impose.
66//! - `llvm-sys` would force every IR-builder call through `unsafe`
67//!   raw pointer arithmetic — maintainability cost on the AOT
68//!   widening Phase B/C is too high for the same target set.
69//! - `clang`/`llc` shell-out drops in-memory JIT verification (we
70//!   want a smoke test to round-trip without writing a file) and
71//!   bloats cold-start with subprocess fork/exec latency. `opt`
72//!   piping also forces stringly-typed IR generation that's awkward
73//!   to debug.
74
75pub mod cocompile;
76mod codegen;
77mod error;
78mod evaluator;
79mod mcjit_mm;
80mod sandbox;
81mod state;
82// `pub` so the wasm parity harness can `func_wrap` the exact same
83// `relon_llvm_f64_to_str` Rust fn the native MCJIT leg maps — one
84// Display byte producer across backends by construction.
85pub mod str_helpers;
86mod vtable;
87mod wasi_host;
88pub mod wasm_link;
89
90/// Generator stamp for the LLVM-AOT codegen, the mirror of
91/// `relon_codegen_cranelift::GENERATOR_VERSION`.
92///
93/// **Today this is a forward-looking placeholder, not yet wired into any
94/// cache key.** The LLVM backend ships no object / ELF cache — every
95/// dispatch JIT-compiles in-process via MCJIT — so there is presently no
96/// persisted byte stream that could go stale against newer codegen.
97///
98/// THE INVARIANT THIS PINS, for whoever adds an LLVM object / ELF /
99/// bitcode cache later: this version string **MUST** be folded into that
100/// cache's integrity key (the HMAC / hash that gates a cache hit), exactly
101/// as the cranelift backend folds its `GENERATOR_VERSION` into the object
102/// cache HMAC (`object_cache_integration::cache_signature`). Bump it on
103/// every codegen-incompatible change (op lowering, ABI / arena layout,
104/// marshalling-seam, entry-shape changes). If a future cache omits this
105/// key, stale machine code from an older generator will be silently
106/// loaded and executed against new host-side decode assumptions — a
107/// silent-wrong-result / memory-safety footgun. See
108/// `docs/internal/adr/capability-and-trust-model.md` for the recorded
109/// rationale.
110pub const GENERATOR_VERSION: &str = "relon-codegen-llvm v0 (no object cache yet)";
111
112pub use codegen::WorldMode;
113pub use error::LlvmError;
114pub use evaluator::{
115    CodegenTarget, EmitObjectInfo, EmittedEntryShape, EmittedField, EmittedFieldType,
116    LlvmAotEvaluator, WasmBufferDispatch,
117};
118pub use relon_eval_api::inplace_return::ArenaRegions;
119pub use sandbox::{CapabilityVtable, SandboxConfig, SandboxTrapKind};
120pub use state::HostFnRegistry;
121pub use vtable::{populate_global_mappings, VtableSlot};