librlm 0.1.0

Recursive Language Models (RLM) - inference-time paradigm for arbitrarily long prompts
Documentation

librlm-rs

A Rust implementation of the Recursive Language Models (RLM) algorithm from "Recursive Language Models" (Zhang, Kraska, Khattab — MIT CSAIL, Jan 2026).

RLM enables LLMs to handle arbitrarily long prompts (10M+ tokens) by treating them as part of an external environment rather than feeding them directly into the context window. The LLM interacts with the prompt through a persistent REPL, writing code to explore, decompose, and recursively invoke sub-LLMs over manageable chunks.

Architecture

┌─────────────────────────────────────────────────────┐
│                    RLM Loop                         │
│                                                     │
│  ┌──────────┐    code     ┌──────────────────────┐  │
│  │ Root LLM │───────────▶│  Embedded Lua REPL    │  │
│  │ (e.g.    │◀───────────│                       │  │
│  │  GPT-5)  │  metadata   │  context = <prompt>  │  │
│  └──────────┘   (stdout)  │  llm_query() ─────┐  │  │
│                           │  re.*, fs.*, json.*│ │  │
│                           └───────────────────│──┘  │
│                                               │     │
│                           ┌───────────────────▼──┐  │
│                           │  Sub LLM             │  │
│                           │  (e.g. GPT-5-mini)   │  │
│                           └──────────────────────┘  │
└─────────────────────────────────────────────────────┘

How It Works

  1. The (potentially very long) prompt is loaded into the REPL as a context variable — not into the LLM's context window
  2. The LLM generates Lua code to explore the context (peek, search, chunk)
  3. Code executes in the REPL; only metadata (length + prefix of output) goes back to the LLM
  4. llm_query() enables recursive sub-LLM calls from within code
  5. FINAL("answer") or FINAL_VAR("var_name") signals completion

Quick Start

use librlm::Rlm;

#[tokio::main]
async fn main() -> Result<(), librlm::RlmError> {
    let rlm = Rlm::builder()
        .root_model("gpt-5")
        .root_api_key("sk-...")
        // Optional: cheaper model for sub-calls from within the REPL
        .sub_model("gpt-5-mini")
        .build()?;

    let result = rlm.completion(
        &std::fs::read_to_string("very_long_document.txt")?,
        Some("What are the key findings?"),
    ).await?;

    println!("{}", result.response);
    println!("Iterations: {}, Tokens: {}", result.iterations, result.total_usage.total_tokens);
    Ok(())
}

Builder API

Method Required Description
root_model(name) Yes* Root LLM model name
root_api_key(key) Yes* API key for root LLM
root_base_url(url) No Base URL (defaults to OpenAI)
sub_model(name) No Sub-call LLM model name
sub_api_key(key) No Sub-call API key (defaults to root)
sub_base_url(url) No Sub-call base URL (defaults to root)
root_backend(impl) No Custom LlmBackend trait object
sub_backend(impl) No Custom sub-call backend
max_iterations(n) No Max REPL loop iterations (default: 30)
max_depth(n) No Max recursion depth (default: 1)
max_output_chars(n) No Max REPL output chars in history (default: 20000)
max_timeout(duration) No Overall timeout

* Not required when using root_backend() instead.

Custom LLM Backend

Implement the LlmBackend trait for non-OpenAI providers:

use librlm::{LlmBackend, Message, CompletionResponse, RlmError};
use async_trait::async_trait;

struct MyBackend;

#[async_trait]
impl LlmBackend for MyBackend {
    fn model_name(&self) -> &str { "my-model" }

    async fn completion(&self, messages: &[Message]) -> Result<CompletionResponse, RlmError> {
        // Your implementation here
        todo!()
    }
}

Configuration

Parameter Default Description
max_depth 1 Recursion depth. 0 = plain LLM call, 1 = one level of REPL interaction
max_iterations 30 Maximum REPL loop iterations before forcing a final answer
max_output_chars 20000 Truncation limit for REPL output included in LLM history
max_timeout None Overall timeout for the completion

Prerequisites

  • Rust toolchain (edition 2021+)
  • No external runtime needed — the Lua VM (Luau) compiles from source into the binary

Design Decisions

  • Embedded Lua REPL: Uses mlua with the Luau feature. The Lua VM compiles from source — zero external dependencies. Rust-backed extensions (re.*, fs.*, json.*) compensate for Lua's minimal stdlib.
  • Two-LLM architecture: Root LLM drives the main loop; optional cheaper sub-LLM handles llm_query() calls (matching the paper's GPT-5 + GPT-5-mini approach).
  • Pure library: All configuration is programmatic. No config files, no environment variables.
  • Async API: Built on tokio for non-blocking LLM HTTP calls.

License

MIT