1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
//! LLVM-backed AOT evaluator for Relon. **Phase B production envelope.**
//!
//! This crate is the second slice of the dual-backend strategy:
//! Cranelift covers the default native AOT route and the LLVM AOT
//! pipeline here chases Rust-native peak performance for the `#main`
//! entry path.
//!
//! ## Scope (Phase B)
//!
//! - Two entry shapes accepted:
//! - **Legacy-i64** (`(I64...) -> I64`) for `from_ir_direct`
//! callers (tests, bench fixtures) — the Phase A bootstrap
//! envelope, retained for cross-backend comparison.
//! - **Buffer-protocol** (`(*state, i32, i32, i32, i32, i64) -> i32`)
//! for `from_source` callers. Matches the cranelift backend's
//! `EntryShape::BufferProtocol` so the runtime envelopes line up.
//! - Source-driven pipeline (`from_source`): parse + analyze +
//! lower (`relon_ir::lower_workspace_single`) + LLVM emit + JIT
//! compile + per-call arena dispatch. The cmp_lua W1 / W2
//! workloads (list.sum(range(n)) / list.sum(range(n).map(...))) go
//! end-to-end through this path.
//! - The op set started with what `lower_workspace_single` synthesised
//! for the W1 / W2 shape after the IR's `range_pipeline` peephole
//! collapsed `range.map.sum` into a single accumulator loop:
//! `LocalGet`, `ConstI64` / `ConstI32` / `ConstBool`, `LetGet` /
//! `LetSet`, `LoadField` / `StoreField` (scalar slots),
//! `Add` / `Sub` / `Mul` / `Div` / `Mod` / `BitAnd` (I32 + I64),
//! `Eq` / `Ne` / `Lt` / `Le` / `Gt` / `Ge`, structured control flow
//! (`Block` / `Loop` / `Br` / `BrIf` / `If`), and `Return`. The
//! emitter has since widened into strings, lists, pointer-indirect
//! fields, host calls, closures, object emission, wasm32 object
//! emission, and several stdlib surfaces; the tests are the
//! authoritative coverage map for those later slices.
//!
//! ## Safety contract
//!
//! The source-driven buffer-protocol path is the production entry and
//! carries the backend sandbox contract: capability gates, div/mod
//! guards, checked signed `Int` arithmetic, arena bounds checks before
//! host-pointer formation, dynamic host-call trap lifting, and
//! deterministic step-budget fuel all report through typed
//! `RuntimeError`s. The legacy / typed-fast i64 entries remain for
//! focused tests and benchmark kernels; they have no `ArenaState` error
//! lane, and public `run_main` routes trap-capable bodies through the
//! buffer entry.
//!
//! ## Remaining limits
//!
//! - **Full language surface** — tree-walk remains the complete
//! reference implementation. LLVM AOT covers the explicitly tested
//! compiled-backend surface and rejects shapes outside that envelope
//! loudly rather than fabricating partial results.
//! - **`.o` / `.so` emit + dlopen** — Phase B still uses the
//! in-process MCJIT engine. The single-knob `OptimizationLevel`
//! API hides the engine choice so Phase C / ORC migration is a
//! localised diff.
//!
//! ## Decision log (Phase A.1)
//!
//! Picked `inkwell` over `llvm-sys` and external `clang`/`llc`:
//!
//! - `inkwell` 0.9.0 with the `llvm18-1` feature pins llvm-sys
//! 181.3.0 against the system LLVM 18.1.3 install at
//! `/usr/lib/llvm-18`. Safe Rust wrappers eliminate the per-op
//! `unsafe` block the raw FFI path would impose.
//! - `llvm-sys` would force every IR-builder call through `unsafe`
//! raw pointer arithmetic — maintainability cost on the AOT
//! widening Phase B/C is too high for the same target set.
//! - `clang`/`llc` shell-out drops in-memory JIT verification (we
//! want a smoke test to round-trip without writing a file) and
//! bloats cold-start with subprocess fork/exec latency. `opt`
//! piping also forces stringly-typed IR generation that's awkward
//! to debug.
// `pub` so the wasm parity harness can `func_wrap` the exact same
// `relon_llvm_f64_to_str` Rust fn the native MCJIT leg maps — one
// Display byte producer across backends by construction.
/// Generator stamp for the LLVM-AOT codegen, the mirror of
/// `relon_codegen_cranelift::GENERATOR_VERSION`.
///
/// **Today this is a forward-looking placeholder, not yet wired into any
/// cache key.** The LLVM backend ships no object / ELF cache — every
/// dispatch JIT-compiles in-process via MCJIT — so there is presently no
/// persisted byte stream that could go stale against newer codegen.
///
/// THE INVARIANT THIS PINS, for whoever adds an LLVM object / ELF /
/// bitcode cache later: this version string **MUST** be folded into that
/// cache's integrity key (the HMAC / hash that gates a cache hit), exactly
/// as the cranelift backend folds its `GENERATOR_VERSION` into the object
/// cache HMAC (`object_cache_integration::cache_signature`). Bump it on
/// every codegen-incompatible change (op lowering, ABI / arena layout,
/// marshalling-seam, entry-shape changes). If a future cache omits this
/// key, stale machine code from an older generator will be silently
/// loaded and executed against new host-side decode assumptions — a
/// silent-wrong-result / memory-safety footgun. See
/// `docs/internal/adr/capability-and-trust-model.md` for the recorded
/// rationale.
pub const GENERATOR_VERSION: &str = "relon-codegen-llvm v0 (no object cache yet)";
pub use WorldMode;
pub use LlvmError;
pub use ;
pub use ArenaRegions;
pub use ;
pub use HostFnRegistry;
pub use ;