librlm-rs

A Rust implementation of the Recursive Language Models (RLM) algorithm from "Recursive Language Models" (Zhang, Kraska, Khattab — MIT CSAIL, Jan 2026).

RLM enables LLMs to handle arbitrarily long prompts (10M+ tokens) by treating them as part of an external environment rather than feeding them directly into the context window. The LLM interacts with the prompt through a persistent REPL, writing code to explore, decompose, and recursively invoke sub-LLMs over manageable chunks.

Architecture

┌─────────────────────────────────────────────────────┐
│                    RLM Loop                         │
│                                                     │
│  ┌──────────┐    code     ┌──────────────────────┐  │
│  │ Root LLM │───────────▶│  Embedded Lua REPL    │  │
│  │ (e.g.    │◀───────────│                       │  │
│  │  GPT-5)  │  metadata   │  context = <prompt>  │  │
│  └──────────┘   (stdout)  │  llm_query() ─────┐  │  │
│                           │  re.*, fs.*, json.*│ │  │
│                           └───────────────────│──┘  │
│                                               │     │
│                           ┌───────────────────▼──┐  │
│                           │  Sub LLM             │  │
│                           │  (e.g. GPT-5-mini)   │  │
│                           └──────────────────────┘  │
└─────────────────────────────────────────────────────┘

How It Works

The (potentially very long) prompt is loaded into the REPL as a context variable — not into the LLM's context window
The LLM generates Lua code to explore the context (peek, search, chunk)
Code executes in the REPL; only metadata (length + prefix of output) goes back to the LLM
llm_query() enables recursive sub-LLM calls from within code
FINAL("answer") or FINAL_VAR("var_name") signals completion

Quick Start

use librlm::Rlm;

#[tokio::main]
async fn main() -> Result<(), librlm::RlmError> {
    let rlm = Rlm::builder()
        .root_model("gpt-5")
        .root_api_key("sk-...")
        // Optional: cheaper model for sub-calls from within the REPL
        .sub_model("gpt-5-mini")
        .build()?;

    let result = rlm.completion(
        &std::fs::read_to_string("very_long_document.txt")?,
        Some("What are the key findings?"),
    ).await?;

    println!("{}", result.response);
    println!("Iterations: {}, Tokens: {}", result.iterations, result.total_usage.total_tokens);
    Ok(())
}

Builder API

Method	Required	Description
`root_model(name)`	Yes*	Root LLM model name
`root_api_key(key)`	Yes*	API key for root LLM
`root_base_url(url)`	No	Base URL (defaults to OpenAI)
`sub_model(name)`	No	Sub-call LLM model name
`sub_api_key(key)`	No	Sub-call API key (defaults to root)
`sub_base_url(url)`	No	Sub-call base URL (defaults to root)
`root_backend(impl)`	No	Custom `LlmBackend` trait object
`sub_backend(impl)`	No	Custom sub-call backend
`max_iterations(n)`	No	Max REPL loop iterations (default: 30)
`max_depth(n)`	No	Max recursion depth (default: 1)
`max_output_chars(n)`	No	Max REPL output chars in history (default: 20000)
`max_timeout(duration)`	No	Overall timeout

* Not required when using root_backend() instead.

Custom LLM Backend

Implement the LlmBackend trait for non-OpenAI providers:

use librlm::{LlmBackend, Message, CompletionResponse, RlmError};
use async_trait::async_trait;

struct MyBackend;

#[async_trait]
impl LlmBackend for MyBackend {
    fn model_name(&self) -> &str { "my-model" }

    async fn completion(&self, messages: &[Message]) -> Result<CompletionResponse, RlmError> {
        // Your implementation here
        todo!()
    }
}

Configuration

Parameter	Default	Description
`max_depth`	1	Recursion depth. 0 = plain LLM call, 1 = one level of REPL interaction
`max_iterations`	30	Maximum REPL loop iterations before forcing a final answer
`max_output_chars`	20000	Truncation limit for REPL output included in LLM history
`max_timeout`	None	Overall timeout for the completion

Prerequisites

Rust toolchain (edition 2021+)
No external runtime needed — the Lua VM (Luau) compiles from source into the binary

Design Decisions

Embedded Lua REPL: Uses mlua with the Luau feature. The Lua VM compiles from source — zero external dependencies. Rust-backed extensions (re.*, fs.*, json.*) compensate for Lua's minimal stdlib.
Two-LLM architecture: Root LLM drives the main loop; optional cheaper sub-LLM handles llm_query() calls (matching the paper's GPT-5 + GPT-5-mini approach).
Pure library: All configuration is programmatic. No config files, no environment variables.
Async API: Built on tokio for non-blocking LLM HTTP calls.

License

MIT

librlm 0.1.0