Laburnum

Laburnum is a Language Server Protocol (LSP) framework built on an incremental query tree. It provides the foundation for implementing language servers whose analyses — parsing, name resolution, type checking, diagnostics, completions — are expressed as tasks over a content-addressed, incrementally recomputed database. As the editor sends file changes, only the tasks whose inputs have actually changed are re-run; everything else is reused from prior chunks.

The LSP layer (protocol types, request routing, position encoding, capability negotiation) is part of the framework, but Laburnum itself is language-agnostic: language-specific behaviour lives in the implementing crate.

Incremental Query Tree

The big idea at the heart of Laburnum is to create an incrementally computed tree, where updates are worked on in parallel then merged back into the tree.

To make this work a computation task has access to a read only view of the tree, and can query the tree in a number of different ways. A task can wait for queries to return records, or timeout.

New records are created inside a task, and stored in a Chunk. When the task completes, the chunk is merged into the tree, and can be queried by other tasks.

The database is queryable as a HashMap of BTrees similar to DynamoDB. With each Record being identified with a partition_key and a sort_key.

Each Chunk is immutable once created, and can contain records from many different partitions.

It is important that Tasks are as deterministic as practical. For the same input and query results, the same records should be returned. The values of the records can be different, but the partition_key and sort_key should be the same.

This is a constraint to some systems, but a natural rule for incremental systems like compilers. Where computation Tasks form a dataflow DAG.

Because our data ends up distributed across Chunks we can often parallelise queries with decent performance improvements, see benchmarks.

Tasks can spawn other tasks, creating a task graph that is automatically tracked. You can check for the results of a task, and spawn it if they're not there.

It's expected that Tasks form a graph, and will be recomputed from the start of the graph each time. As all the expensive work is stored in the database, we can walk over the graph and only recompute what's changed.

Architecture Overview

Task System

Laburnum's computation model is built around asynchronous tasks that incrementally compute data:

LaburnumTask: A struct that wraps an async future producing records. Each task executes via poll_once() and writes results as chunks to the database.
TaskContext: Given to tasks during execution. Provides a query_client() method to read from the database and a spawn_task() method to create child tasks.
Parent Tracking: Parent-child relationships between tasks are tracked via the parent_task_id field on chunks. When a task spawns another task via TaskContext::spawn_task(), the parent is recorded on the resulting chunk.
Scheduler: Manages task execution. You add tasks with queue() and call run() to execute them. Tasks are executed by a worker pool according to lane priority.

The parent tracking enables incremental recompilation - when inputs change, only tasks that depend on those inputs need to be recomputed.

Chunks and Storage

All data in Laburnum is stored in immutable Chunks:

Each chunk is identified by a content hash computed from its inputs and dependencies
Chunks contain records organized by partition_key and sort_key
Multiple chunks can contain records for the same partition key - this is expected and beneficial for parallelism
Chunks are never modified - new computations create new chunks

The database is generic over storage via the RecordStorage trait, allowing different backends for different use cases (compiler IR, metadata, etc.).

Query System

Tasks query the database through QueryClient:

Direct queries: get_record(partition_key, sort_key) for exact lookups
Fluent queries: query(partition_key).sort_key_begins_with(prefix).execute() for range and prefix queries
Batch queries: batch_get_items() for multiple lookups in one call

When a task queries for data, the QueryClient automatically:

Tracks which chunks were read (building dependency graph)
Records missing data as "pending dependencies"
Decides whether to execute sequentially or in parallel (see Parallelization)

If a query returns no results, the task blocks until another task produces that data.

Supporting Components

Identifiers (Ident):

Deterministic hash-based identifiers from strings (using rapidhash v3)
Used for partition_key, task_id, and other keys
Enables fast equality checks and hashing

Spans (Span & SpanCache):

Track source code locations for error reporting
SpanCache is a per-file cache (created with file_id and version)
Supports creating rich, user-friendly diagnostics

Key Design Features

Content-Addressed Storage

Chunks are immutable and identified by content hash. New source changes create new chunks, not replacements. Old chunks remain until garbage collected.

Dependency Tracking

Automatic tracking of which chunks were read during computation. Forms a DAG from source files through compilation stages. Enables precise incremental recompilation.

Query Patterns

Location-based: Query by primary key + sort key (gets latest data)
Content-based: Query by content hash (immutable lookup)
Prefix search: Query by sort key prefix for hierarchical data
Range queries: Query by sort key ranges (between, less than, greater than)

Queries naturally return all matching records across all chunks - this is the correct behavior. The query system aggregates results from multiple chunks and returns them sorted.

Performance Optimizations

Parallel queries across chunks when beneficial. Adaptive thresholds based on chunk count. No defragmentation needed - embraces distributed chunks as a performance feature.

Garbage Collection (partially implemented)

Mark-and-sweep from entry points (source files). Lazy GC during idle periods. Maintains referential integrity across indexes.

laburnum 1.17.1