Ogham
Ogham — the early-Irish script that carved language into compact notches. This SDK does the same to your LLM context.
Ogham is a pure-Rust SDK for LLM context engineering: it compresses, prunes, and budgets conversation context before it reaches the model — entirely in-process. Agentic workloads routinely burn 60–90% of their tokens on stale tool output; Ogham reclaims them without losing the things that matter (errors, decisions, file paths), and everything it removes stays retrievable.
No subprocesses. No network calls. No background tasks. Every operation is a deterministic library call.
What it does
| Capability | In one line |
|---|---|
| Content compression | detects JSON / logs / code / diffs / prose and routes each to a purpose-built compressor |
| Reversible by default | originals live in a CCR store (in-memory, SQLite, fjall); compressed text carries <<ccr:HASH>> markers the model can dereference |
| Agent-aware rules | clears stale successful tool results to one-line stubs; never touches errors, system prompts, the latest user query, or pinned messages |
| Token budgets | a degradation cascade (compress → summarize → drop) that fits any history into a limit — or refuses with BudgetExceeded rather than overflow |
| Honest token counting | exact for OpenAI encodings (tiktoken feature); calibrated estimates with an explicit safety margin elsewhere — Claude tokenizers aren't public and Ogham doesn't pretend otherwise |
| Structured summaries | deterministic extraction of files / decisions / errors / next-steps, plus a Summarizer trait for LLM-backed memory |
| Prompt-cache support | byte-stable prefixes + provider breakpoint annotation for KV-cache reuse |
| Observable | every decision streams through Observer/Metrics traits — auditable, testable |
Install
[]
= "0.3"
# exact OpenAI token counting:
# ogham = { version = "0.3", features = ["tiktoken"] }
Prebuilt ogham-server binaries (Linux, macOS universal, Windows; sha256 checksums) are on the
releases page.
Sixty seconds
Compress a tool result (content level):
use ;
let out = compress_messages.await?;
println!;
Fit a whole conversation into a budget (conversation level):
use Arc;
use ;
use ;
use InMemoryCcrStore;
use DefaultCompressionPipeline;
use counter_for_model;
let ccr = new;
let pipeline = with_ccr_store;
let policy = default;
// 1. clear stale tool results to retrievable CCR stubs
apply_agent_compression.await?;
// 2. guarantee the prompt fits — or get Err(BudgetExceeded), never an oversized send
enforce_budget.await?;
A cleared tool result looks like this — and the original is one
ccr.retrieve(hash) away:
[tool:file_read] result cleared (512 tokens) — original retrievable via <<ccr:86a33abc...>>
Guarantees
- Fail-closed. Any internal error returns the original content unchanged. A budget that can't be met is an error, not a truncated prompt.
- Deterministic. Same input + same config ⇒ byte-identical output. No clocks, no randomness.
- Protected content. Tool errors, system instructions, and the latest user query survive every pass verbatim — enforced by tests, not convention.
- Measured, not fabricated. Stats reflect events that actually fired; estimated token counts are labelled as estimates.
Documentation
| Doc | Contents |
|---|---|
| docs/architecture.md | crate layering, data flow, invariants, design decisions |
| docs/compression.md | detection, the six compressors, CCR stores, cache alignment |
| docs/agent-context.md | classification, tool clearing, budgets, summaries, cache strategy |
| docs/server.md | the optional HTTP server: env vars, endpoints, embedding |
| docs/integration.md | wiring Ogham into an agent runtime, end to end |
| DESIGN.md | the full engineering record: build plan, work packages, ADRs |
| CONTRIBUTING.md · RELEASING.md · CHANGELOG.md | process |
API reference: docs.rs/ogham, or locally via
cargo doc --workspace --no-deps --open.
Crates
| Crate | What | When to depend on it |
|---|---|---|
ogham |
the SDK: compressors, CCR, pipeline, agent rules, budgets, summaries | almost always |
ogham-core |
traits + types only (5 small deps, no runtime) | implementing custom compressors/observers |
ogham-server |
embeddable Axum HTTP server + standalone binary | non-Rust clients, standalone deployment |
Status
0.3.0 — APIs may change before 1.0 (semver-minor may break). The test
suite covers fail-closed behavior, determinism, fuzzing, golden-file
regression, and needle-in-haystack survival probes; see
CONTRIBUTING.md for the gate every change must pass.
License & attribution
Apache-2.0 — see LICENSE.
Ogham's architecture is substantially derived from Headroom (Apache-2.0): the CCR reversible-storage concept, content-type routing, and the SmartCrusher / LogStripper compressor designs originate there. The agent-aware rules, token budgets, tiered summaries, and cache integration are informed by published research from Anthropic, Factory.ai, and Microsoft (LLMLingua). Full attribution in NOTICE.