Laburnum
Laburnum is a Language Server Protocol (LSP) framework built on an incremental query tree. It provides the foundation for implementing language servers whose analyses — parsing, name resolution, type checking, diagnostics, completions — are expressed as tasks over a content-addressed, incrementally recomputed database. As the editor sends file changes, only the tasks whose inputs have actually changed are re-run; everything else is reused from prior chunks.
The LSP layer (protocol types, request routing, position encoding, capability negotiation) is part of the framework, but Laburnum itself is language-agnostic: language-specific behaviour lives in the implementing crate.
Incremental Query Tree
The big idea at the heart of Laburnum is to create an incrementally computed tree, where updates are worked on in parallel then merged back into the tree.
To make this work a computation task has access to a read only view of the tree, and can query the tree in a number of different ways. A task can wait for queries to return records, or timeout.
New records are created inside a task, and stored in a Chunk. When the task
completes, the chunk is merged into the tree, and can be queried by other tasks.
The database is queryable as a HashMap of BTrees similar to DynamoDB. With each
Record being identified with a partition_key and a sort_key.
Each Chunk is immutable once created, and can contain records from many different
partitions.
It is important that Tasks are as deterministic as practical. For the same input and query results, the same records should be returned. The values of the records can be different, but the partition_key and sort_key should be the same.
This is a constraint to some systems, but a natural rule for incremental systems like compilers. Where computation Tasks form a dataflow DAG.
Because our data ends up distributed across Chunks we can often parallelise
queries with decent performance improvements, see benchmarks.
Tasks can spawn other tasks, creating a task graph that is automatically tracked. You can check for the results of a task, and spawn it if they're not there.
It's expected that Tasks form a graph, and will be recomputed from the start of the graph each time. As all the expensive work is stored in the database, we can walk over the graph and only recompute what's changed.
Architecture Overview
Task System
Laburnum's computation model is built around asynchronous tasks that incrementally compute data:
-
LaburnumTask: A struct that wraps an async future producing records. Each task executes via
poll_once()and writes results as chunks to the database. -
TaskContext: Given to tasks during execution. Provides a
query_client()method to read from the database and aspawn_task()method to create child tasks. -
Parent Tracking: Parent-child relationships between tasks are tracked via the
parent_task_idfield on chunks. When a task spawns another task viaTaskContext::spawn_task(), the parent is recorded on the resulting chunk. -
Scheduler: Manages task execution. You add tasks with
queue()and callrun()to execute them. Tasks are executed by a worker pool according to lane priority.
The parent tracking enables incremental recompilation - when inputs change, only tasks that depend on those inputs need to be recomputed.
Chunks and Storage
All data in Laburnum is stored in immutable Chunks:
- Each chunk is identified by a content hash computed from its inputs and dependencies
- Chunks contain records organized by
partition_keyandsort_key - Multiple chunks can contain records for the same partition key - this is expected and beneficial for parallelism
- Chunks are never modified - new computations create new chunks
The database is generic over storage via the RecordStorage trait, allowing different backends for different use cases (compiler IR, metadata, etc.).
Query System
Tasks query the database through QueryClient:
- Direct queries:
get_record(partition_key, sort_key)for exact lookups - Fluent queries:
query(partition_key).sort_key_begins_with(prefix).execute()for range and prefix queries - Batch queries:
batch_get_items()for multiple lookups in one call
When a task queries for data, the QueryClient automatically:
- Tracks which chunks were read (building dependency graph)
- Records missing data as "pending dependencies"
- Decides whether to execute sequentially or in parallel (see Parallelization)
If a query returns no results, the task blocks until another task produces that data.
Supporting Components
Identifiers (Ident):
- Deterministic hash-based identifiers from strings (using rapidhash v3)
- Used for
partition_key,task_id, and other keys - Enables fast equality checks and hashing
Spans (Span & SpanCache):
- Track source code locations for error reporting
- SpanCache is a per-file cache (created with file_id and version)
- Supports creating rich, user-friendly diagnostics
Key Design Features
Content-Addressed Storage
Chunks are immutable and identified by content hash. New source changes create new chunks, not replacements. Old chunks remain until garbage collected.
Dependency Tracking
Automatic tracking of which chunks were read during computation. Forms a DAG from source files through compilation stages. Enables precise incremental recompilation.
Query Patterns
- Location-based: Query by primary key + sort key (gets latest data)
- Content-based: Query by content hash (immutable lookup)
- Prefix search: Query by sort key prefix for hierarchical data
- Range queries: Query by sort key ranges (between, less than, greater than)
Queries naturally return all matching records across all chunks - this is the correct behavior. The query system aggregates results from multiple chunks and returns them sorted.
Performance Optimizations
Parallel queries across chunks when beneficial. Adaptive thresholds based on chunk count. No defragmentation needed - embraces distributed chunks as a performance feature.
Garbage Collection (partially implemented)
Mark-and-sweep from entry points (source files). Lazy GC during idle periods. Maintains referential integrity across indexes.