Context Builder
A blazing-fast CLI for creating LLM context from your entire codebase.
Tired of manually copy-pasting files into your LLM prompts? Context Builder automates this tedious process, creating a single, clean, and context-rich markdown file from any directory.
Why Context Builder?
Providing broad context to Large Language Models (LLMs) is key to getting high-quality, relevant responses. This tool was built to solve one problem exceptionally well: packaging your project's source code into a clean, LLM-friendly format with zero fuss.
It's a command-line utility that recursively processes directories and creates comprehensive markdown documentation, optimized for AI conversations.
Core Features
-
⚡ Blazing Fast & Parallel by Default:
Processes thousands of files in seconds by leveraging all available CPU cores. -
🧠 Smart & Efficient File Discovery:
Respects.gitignoreand custom ignore patterns out-of-the-box using optimized, parallel directory traversal. -
💾 Memory-Efficient Streaming:
Handles massive files with ease by reading and writing line-by-line, keeping memory usage low. -
🌳 Clear File Tree Visualization:
Generates an easy-to-read directory structure at the top of the output file. -
🔍 Powerful Filtering & Preview:
Easily include only the file extensions you need and use the instant--previewmode to see what will be processed. -
⚙️ Configuration-First:
Use a.context-builder.tomlfile to store your preferences for consistent, repeatable outputs. -
🔁 Automatic Per-File Diffs:
When enabled, automatically generates a clean, noise-reduced diff showing what changed between snapshots. -
✂️ Diff-Only Mode:
Output only the change summary and modified file diffs—no full file bodies—to minimize token usage. -
🧪 Accurate Token Counting:
Get real tokenizer–based estimates with--token-countto plan your prompt budgets.
Installation
From crates.io (Recommended)
From source
Usage
Basic Usage
# Process current directory and create output.md
# Process a specific directory
# Specify an output file
Advanced Options
# Filter by file extensions (e.g., only Rust and TOML files)
# Ignore specific folders/files by name
# Preview mode (shows the file tree without generating output)
# Token count mode (accurately count the total token count of the final document using a real tokenizer.)
# Add line numbers to all code blocks
# Combine multiple options for a powerful workflow
Configuration
For more complex projects, you can use a .context-builder.toml file in your project's root directory to store your preferences. This is great for ensuring consistent outputs and avoiding repetitive command-line flags.
Example .context-builder.toml
# Default output file name
= "context.md"
# Default output folder
= "docs/context"
# Create timestamped versions of the output file (e.g., context_20250912123000.md)
= true
# Automatically compute per-file diffs against the previous timestamped snapshot
= true
# Emit only change summary + modified file diffs (omit full file bodies)
# Set to true to greatly reduce token usage when you just need what's changed.
= false
# File extensions to include
= ["rs", "toml", "md"]
# Folders or file names to ignore
= ["target", "node_modules", ".git"]
# Add line numbers to code blocks
= true
Auto-diff
When using timestamped_output = true together with auto_diff = true, Context Builder compares the previous canonical snapshot to the newly generated one and produces:
- A Change Summary (Added / Removed / Modified files)
- A File Differences section containing only modified files (added & removed are summarized but not diffed)
If you also set diff_only = true (or pass --diff-only), the full “## Files” section is omitted to conserve tokens: you get just the header + tree, the Change Summary, and per-file diffs for modified files.
Note: Command-line arguments will always override the settings in the configuration file.
Command Line Options
-d, --input <PATH>- Directory path to process (default: current directory).-o, --output <FILE>- Output file path (default:output.md).-f, --filter <EXT>- File extensions to include (can be used multiple times).-i, --ignore <NAME>- Folder or file names to ignore (can be used multiple times).--preview- Preview mode: only show the file tree, don't generate output.--token-count- Token count mode: accurately count the total token count of the final document using a real tokenizer.--line-numbers- Add line numbers to code blocks in the output.--diff-only- With--auto-diff+--timestamped-output, output only change summary + modified file diffs (omit full file bodies).-h, --help- Show help information.-V, --version- Show version information.
Token Counting
Context Builder uses the tiktoken-rs library to provide accurate token counts for OpenAI models. This ensures that the token count is as close as possible to the actual number of tokens that will be used by the model.
Documentation
- DEVELOPMENT.md: For contributors. Covers setup, testing, linting, and release process.
- BENCHMARKS.md: For performance enthusiasts. Details on running benchmarks and generating datasets.
- CHANGELOG.md: A complete history of releases and changes.
Contributing
Contributions are welcome! Please see DEVELOPMENT.md for setup instructions and guidelines. For major changes, please open an issue first to discuss what you would like to change.
Changelog
See CHANGELOG.md for a complete history of releases and changes.
License
This project is licensed under the MIT License. See the LICENSE file for details.