atlas
Deterministic knowledge base indexer for AI agents
Generates multi-resolution markdown indexes of knowledge bases, solving the "AI doesn't know what it knows" problem through cheap, deterministic, static analysis.
Status
Experimental. atlas is useful enough to inspect and try, but command behavior, generated view formats, plugin surfaces, and release channels may change.
The official crates.io package is agent-atlas, and the installed command is atlas.
The Problem
AI agents working with knowledge bases (folders of markdown, PDFs, notes) face a chicken-and-egg problem: they need to know what exists before they can effectively search, but they can't search without knowing what to look for.
Existing solutions (embeddings, auto-memories, RAG) optimize for retrieval given a query, but don't solve the fundamental discovery problem.
The Solution
atlas creates a persistent, human-readable atlas of your knowledge base:
- ROOT_ATLAS.md — Top-level map with folder signatures and key files
- Per-folder INDEX.md — Detailed listings with snippets and top terms
- TERMS.md — Concept-to-file mapping for topic navigation
The atlas is:
- Deterministic — No LLM, no randomness, fully reproducible
- Incremental — Only reprocesses changed files
- Portable — Plain files, works with any AI tool
- Fast — Sub-second for unchanged corpora
Quick Start
# Install from crates.io
# Or install from a local checkout for development
# Initialize in your knowledge base
# Build the index
# View the atlas
Commands
atlas init # Initialize .atlas directory
atlas scan # Scan for changes (fast fingerprint check)
atlas build # Build/update index and generate views
atlas search # Lexical search with deterministic ranking and highlights
atlas doctor # Report issues (extraction failures, stale cache)
atlas clean # Remove cached data
Search
atlas search stays lexical and deterministic. Results are ranked by relevance score descending,
then by path ascending for exact score ties.
# Human-readable output with one excerpt per hit
# Stable JSON envelope with matched fields, reasons, and highlight payloads
# Include raw Tantivy explanation trees without changing the default JSON shape
What Gets Indexed
For each file, atlas extracts:
- Title — First heading or derived from filename
- Snippet — First paragraph (~400 chars)
- Top terms — TF-IDF weighted distinctive words
- Top phrases — Frequent bigrams and trigrams
- Links — Internal links, external URLs, citations (DOI, arXiv, ISBN)
- Word/char count — For quick size estimates
Across the corpus:
- Global term frequencies — Which terms are distinctive
- Folder signatures — Top terms/phrases per folder
- Cross-references — Term-to-file and phrase-to-file mappings
Supported File Types
By default, atlas indexes:
- Markdown (
.md) - Plain text and notes (
.txt,.rst,.org) - PDF (
.pdf) — viapdftotext(Poppler) - Rust (
.rs) - TypeScript / TSX (
.ts,.tsx) - JavaScript / JSX (
.js,.jsx,.mjs,.cjs) — using the same code extractor as TypeScript/TSX when possible - Common config/text files (
.json,.yml,.yaml,.toml,.sh,.sql) — plaintext fallback
Structured formats such as Markdown and code get richer extraction (headings, symbols, links) where supported. Config-style files still contribute snippets and terms through plaintext extraction.
Configuration
Edit .atlas/config.toml:
[]
= [".git", ".atlas", "node_modules", "__pycache__", "*.pyc", ".DS_Store"]
= [
"md",
"txt",
"pdf",
"rst",
"org",
"rs",
"ts",
"tsx",
"js",
"jsx",
"mjs",
"cjs",
"json",
"yml",
"yaml",
"toml",
"sh",
"sql",
]
[]
= 10000000 # 10MB
= 400
[]
= 20
= 10
= 3
= 25
= 0.4
= 2
= 0.5
[]
= 3
= 10
PDF Support
PDF extraction requires pdftotext from Poppler:
# macOS
# Ubuntu/Debian
# Fedora
Use with AI Agents
The generated markdown files are designed to be loaded as context:
- Always-on context: Include
ROOT_ATLAS.mdin every conversation - On-demand: Load folder
INDEX.mdorTERMS.mdwhen exploring specific topics - Deep dive: Reference specific file cards when needed
Example agent instruction:
You have access to the knowledge base atlas in .atlas/views/ROOT_ATLAS.md.
Use it to understand what exists before searching. When exploring a topic,
check the relevant folder INDEX.md for detailed file listings.
Plugins
Atlas also ships optional npm packages for agent runtime plugins:
@skastr0/atlas-codex-plugin— Codex lifecycle hooks that initialize Atlas and refresh changed indexes after edit tools run.@skastr0/atlas-opencode-plugin— OpenCode event plugin that initializes Atlas and debounces changed-only rebuilds after file edit events.
Both plugins expect the atlas binary to be available on PATH.
Project Status
Phase 1 (Complete): Scaffold, CLI, configuration, types Phase 2 (In Progress): Text extraction, analysis pipeline Phase 3 (Planned): Global aggregation, view rendering Phase 4 (Planned): Doctor diagnostics, polish
Building from Source
# Development build
# Release build (optimized, stripped)
# Install locally
Install from crates.io with:
Verification
The GitHub Actions CI workflow runs tests, package file-list inspection, and package verification. The protected crate release workflow runs the publish dry-run before any real upload, then uses crates.io trusted publishing to request a short-lived publish token through GitHub Actions OIDC. The npm plugin publish workflow uses npm trusted publishing for the Codex and OpenCode packages. Real package publication is gated by the protected release environment and requires explicit maintainer approval.
Contributing And Support
This is an issues-first, solo-maintained project. Reproducible bugs, documentation corrections, and scoped proposals are the best contribution path. See CONTRIBUTING.md and SUPPORT.md for boundaries and expectations.
Security
Do not report suspected vulnerabilities in public issues. Use the private process described in SECURITY.md.
License
MIT