tokenix 0.23.2

Local semantic index CLI for LLM token optimization
tokenix-0.23.2 is not a library.

tokenix is a local-first Rust CLI that helps AI coding agents understand a repository without dumping huge files into the prompt. It indexes your code, finds relevant chunks by meaning, returns compact file outlines, and can hook into AI tools to replace noisy reads and command output with smaller, more useful context. Works with Claude Code, GitHub Copilot, OpenAI Codex CLI, Gemini, and any MCP client. No Ollama or external server required.

Without tokenix:  Read(src/auth/middleware.rs) → 800 lines → ~2,400 tokens  (illustrative)
With tokenix:     tokenix read src/auth/middleware.rs → symbol outline → ~180 tokens

Savings depend on codebase size, AI behavior, and file sizes. Run tokenix gain to see your real numbers.


What Is tokenix?

AI coding agents often waste context on the wrong shape of information: entire files, long grep output, repeated build logs, and directory listings that are much larger than the useful signal inside them. tokenix is a context layer between the agent and your repository.

It does four jobs:

Job What tokenix does Why it matters
Index the repository Walks source files, splits them into symbol-aware chunks, and stores local embeddings in SQLite The agent can search by intent instead of opening files blindly
Read files compactly Returns outlines, symbols, or line ranges instead of full files when possible Large files stop consuming thousands of unnecessary tokens
Intercept assistant tools Hooks into supported tools before large reads and rewrites noisy command output Optimization happens automatically during normal AI sessions
Measure savings Logs hook decisions and estimates token/cost reduction with tokenix gain You can see whether it is actually helping on your codebase

tokenix is not a cloud service, not a vector database server, and not a replacement for your AI assistant. It is a local repository index plus a set of CLI and hook integrations that make the assistant's context smaller and more targeted.


⚡ Quick Install

Pre-built binary (recommended)

Every release ships a binary per platform. The version-less releases/latest/download/<asset> URL always resolves to the newest release, so you never pin a version.

Platform Asset
Linux x86_64 tokenix-linux-x86_64
Linux arm64 tokenix-linux-aarch64
macOS x86_64 tokenix-macos-x86_64
macOS arm64 (Apple Silicon) tokenix-macos-aarch64
Windows x86_64 tokenix-windows-x86_64.exe
Windows x86_64 (GPU / DirectML) tokenix-windows-x86_64-directml.exe

Each release also publishes sha256sums.txt and SLSA build-provenance attestations — see SECURITY.md for how to verify a download.

macOS / Linux — swap the asset for your platform from the table:

curl -fsSL https://github.com/juninmd/tokenix/releases/latest/download/tokenix-linux-x86_64 -o tokenix
chmod +x tokenix && sudo mv tokenix /usr/local/bin/tokenix
tokenix doctor   # verify

Windows (PowerShell):

irm https://github.com/juninmd/tokenix/releases/latest/download/tokenix-windows-x86_64.exe -OutFile tokenix.exe

🤖 For AI agents / LLMs: prefer this prebuilt binary over cargo install (no Rust toolchain, no compile step). Always fetch the version-less URL https://github.com/juninmd/tokenix/releases/latest/download/tokenix-<platform> — it redirects to the latest release, so never hard-code a version. Detect the platform, download the matching asset from the table, mark it executable, then run tokenix doctor to confirm. The embedding model downloads automatically on first use.

From crates.io

cargo install tokenix --locked

From source

git clone https://github.com/juninmd/tokenix
cd tokenix
cargo install --path . --locked

Use --locked. It builds against the committed Cargo.lock; without it cargo install re-resolves dependencies and can pull an incompatible ureq into the ort-sys build script.

Requirements: a recent stable Rust toolchain (edition 2021). No Ollama, no Python, no external services.

The embedding model (nomic-embed-text-v1.5, ~130 MB) is downloaded automatically on first use and cached locally.


✨ Features

Feature Description
Semantic search Find relevant code by meaning, not just keywords (tokenix query)
Exact search Regex/literal search over indexed content, no embedding (tokenix grep)
One-call task context tokenix context combines semantic search, entry points, and compact outlines so agents do not burn calls chaining search/read loops
Graph-aware explore tokenix explore returns related symbols, relationship maps, and grouped source in one capped call
Symbol graph tokenix symbols, callers, callees, and impact trace relationships between indexed symbols
Interactive HTML graphs tokenix impact --format html and tokenix tokenmap --format html export vis.js / tree visualizations
Token map tokenix tokenmap shows a directory tree with token counts per file/folder
Preference memory tokenix memory add/list stores global and project preferences in editable Markdown; context/explore include saved preferences
Dynamic language detection Map custom file extensions to any built-in parser via a project .tokenix.toml — no recompile needed
Symbol-aware chunking AST Tree-sitter parsers for Rust, Python, TypeScript, JavaScript, Go, C/C++
Smart file reader Outlines large files; supports --symbol and --lines reads
Hook-based interception PreToolUse intercepts large reads and rewrites noisy Bash commands before execution
RTK-grade compression Fuzzy grouping (collapses Removing…, Compiling… lines), NDJSON/JSON compaction, and ANSI/Emoji stripping
Local project filters Drop .toml files in .tokenix/filters/ for project-scoped compression rules — highest priority over user and bundled filters
Output filters 72 RTK-compatible TOML filters embedded in the binary — auto-applied to Bash output for uv, cargo, terraform, ansible, and more
Filter generation tokenix filter generate writes a TOML filter for a command; tokenix filter record captures real output for richer generation
GPU acceleration (opt-in) Build with --features directml (Windows) or --features cuda to run embeddings on GPU; GPU is used by default at runtime with automatic CPU fallback, or force CPU with --only-cpu
Environment diagnostics tokenix doctor reports the compiled backend, detected GPU, CUDA/cuDNN status, model cache, and daemon
In-memory daemon tokenix serve keeps model + index in RAM so repeated hook calls avoid reloading the model each invocation
Graceful fallback Exits 0 on errors — your AI session is never broken
Token budget Results fit within a configurable token budget (default 1200)
Savings analytics tokenix gain — token summary and by-tool histogram; --cost-estimate adds a per-model cost table (9 reference models across Anthropic / OpenAI / Google)
Local-first, no dependencies fastembed ONNX in-process — no Ollama, no server, no internet after first run

🔌 Supported AI Tools

Tool Integration
Claude Code PreToolUse hooks in ~/.claude/settings.json or project .claude/settings.local.json
GitHub Copilot .github/copilot-instructions.md + VS Code-compatible .github/hooks/hooks.json
OpenAI Codex CLI ~/.codex/hooks.json for PreToolUse Bash rewrites + optional shell helpers
Gemini tokenix install-hook --tool gemini
Any MCP client tokenix mcp — Model Context Protocol server over stdin/stdout (--tool mcp)

🚀 How It Works

tokenix has two modes:

  1. Manual mode: run tokenix query, tokenix read, tokenix context, etc. directly when you want compact context.
  2. Hook mode: install hooks so supported AI tools call tokenix automatically before large reads and before noisy Bash commands execute.

Output compression (RTK mode)

tokenix includes output-filtering logic inspired by RTK (Rust Token Killer). It doesn't just truncate output; it understands the structure of common CLI tools.

  • Fuzzy grouping: collapses hundreds of Compiling… or Removing… lines into a single summary line.
  • Structural compaction: compacts pretty-printed JSON and NDJSON into single-line formats.
  • Signal preservation: keeps error messages and summaries even when the middle of a log is truncated.

🛠 Usage

1. Index your repository

cd my-project
tokenix index .

First run: the model (~130 MB) is downloaded automatically. Subsequent runs use the local cache.

2. Search

tokenix query "how does JWT validation work"      # semantic
tokenix query "database connection pooling" --budget 2000
tokenix grep "fn validate_token" --ignore-case    # exact regex/literal

3. One-call task context

tokenix context "fix login refresh token bug"
tokenix context "how does the indexer batch embeddings" --budget 2000
tokenix explore "run_hook hook_post compression" --budget 4000

4. Smart file reader

tokenix read src/auth/middleware.rs                           # symbol outline
tokenix read src/auth/middleware.rs --symbol validate_token   # targeted
tokenix read src/auth/middleware.rs --lines 45-80             # line range

5. Symbol graph & maps

tokenix symbols validate_token
tokenix callers validate_token
tokenix callees run_hook
tokenix impact update_user --depth 2
tokenix impact update_user --format html --output update_user.html   # vis.js graph
tokenix tokenmap                                                     # token tree
tokenix rebuild-graph   # recompute relationships without re-embedding

6. Token savings analytics

tokenix gain                  # token summary + by-tool histogram
tokenix gain --history        # include per-call history
tokenix gain --cost-estimate  # add the per-model cost table

tokenix gain --cost-estimate prices the savings against 9 reference models across Anthropic, OpenAI, and Google. Prices are shown with their collection date (currently 2026-06-01) so the numbers stay auditable.


🔧 Setup by Tool

Claude Code

tokenix install-hook --tool claude-code

Writes a PreToolUse hook to ~/.claude/settings.json (or .claude/settings.local.json with --local). Large reads, semantic greps, and noisy Bash commands are intercepted automatically — no changes to your prompts needed. hook-post (PostToolUse) remains a compatibility handler, not a default Claude install, because it cannot replace the original tool output.

GitHub Copilot

cd my-project
tokenix install-hook --tool copilot
git add .github/
git commit -m "chore: add tokenix context instructions"

Creates .github/copilot-instructions.md and .github/hooks/hooks.json.

OpenAI Codex CLI

tokenix install-hook --tool codex
# bash / zsh
echo 'source ~/.codex/tokenix-init.sh' >> ~/.bashrc
# PowerShell
echo '. ~/.codex/tokenix-init.ps1' >> $PROFILE

Then use tx-read and tx-query as shell helpers. On Windows this also installs ~/.codex/hooks.json and a PowerShell wrapper that forwards PreToolUse intercepts for Bash-like terminal tools (Bash, run_in_terminal) and normalizes grep_search to the same semantic path as Grep.

All tools at once

tokenix install-hook --tool all

📖 Commands Reference

Command Description
tokenix index [PATH] Index the repo at PATH (default .)
tokenix query TEXT Semantic search over indexed chunks
tokenix grep PATTERN Exact regex/literal search over indexed content (no embedding)
tokenix context TEXT One-call task context: entry points, relevant source, compact outlines
tokenix explore TEXT Graph-aware exploration: entry points, relationships, grouped source
tokenix memory add TEXT Save a preference (--global or --project) for future context
tokenix memory list List global and project preferences
tokenix memory remove TEXT Remove preferences matching text
tokenix memory edit TEXT Replace preferences matching text
tokenix read FILE Smart reader — outline for large files, full for small
tokenix symbols QUERY Find indexed symbols by name or path
tokenix callers SYMBOL Show symbols that call/reference a symbol
tokenix callees SYMBOL Show symbols called/referenced by a symbol
tokenix impact SYMBOL Bidirectional impact graph (--format html for a vis.js graph)
tokenix tokenmap Directory tree map with token counts (--format html supported)
tokenix rebuild-graph Rebuild graph tables from existing chunks without re-embedding
tokenix gain Token savings analytics (--cost-estimate adds a per-model cost table)
tokenix benchmark Reproducible token-savings and retrieval-quality benchmark
tokenix stats Index statistics (files, chunks, tokens, age)
tokenix serve Start the background embedding daemon (keeps model + index in RAM)
tokenix stop Stop the background daemon
tokenix doctor Diagnose embedding backend, GPU availability, model cache, and daemon
tokenix filter list Show top Bash commands by tokens wasted (no filter yet)
tokenix filter active Show active user and bundled output filters
tokenix filter generate [CMD] AI-generate a TOML output filter for a command
tokenix filter record [CMD] Record real command output for richer filter generation
tokenix run -- CMD Run a command and compress its output through tokenix filters
tokenix install-hook Install assistant hook/instructions (default --tool all)
tokenix remove-hook Remove assistant hook/instructions (default --tool all)
tokenix hook PreToolUse handler — intercepts large reads, semantic grep, and noisy Bash commands (called by AI tools)
tokenix hook-post Legacy PostToolUse compatibility handler
tokenix mcp MCP server exposing context, read/search, graph, and gain tools

Global

Flag Description
--only-cpu Force CPU embedding even on a GPU-enabled build (no-op on CPU-only builds)

tokenix index--force/-f, --cpu-profile <low\|default\|max>, --jobs N, --embed-batch N (default 16 CPU / 64 GPU), --if-stale, --path/-p

tokenix query--budget/-b (1200), --k (20), --file/-f, --path/-p

tokenix grep--limit/-l (20), --ignore-case/-i, --file/-f, --path/-p

tokenix impact--depth/-d (2), --limit/-l (50), --format <text\|html>, --output/-o, --path/-p

tokenix install-hook / remove-hook--tool <claude-code\|copilot\|codex\|mcp\|gemini\|all> (default all), --local (claude-code only)


🧠 Supported Languages

Language Extensions Symbol types
Rust .rs fn, struct, enum, impl, trait, mod
Python .py def, async def, class
TypeScript .ts, .tsx function, class, interface, type, arrow functions
JavaScript .js, .jsx, .mjs, .cjs function, class, arrow functions
Go .go func, type
C / C++ .c, .cpp, .h, .hpp, .cc, .cxx function, class, struct, namespace
Config / Docs .toml, .md, .txt, .sh, .bash line blocks
Data files (opt-in) .json, .yaml, .yml Indexed only when data_files = true in .tokenix.toml
Custom any extension Mapped to an existing parser via .tokenix.toml

Languages without a symbol-aware chunker (Java, C#, Ruby, Swift, Kotlin, …) are not indexed by default — blind line-block chunking produces low-quality search results.

Custom language mapping

Create a .tokenix.toml (or tokenix.toml) in the project root:

[languages]
pyi = "python"       # Python stub files
mts = "typescript"   # TypeScript module files
lua = "generic"      # use sliding-window chunks

Valid parser values: rust, python, typescript, javascript, go, cpp, c, generic.


🔧 Output Filters

tokenix reduces noisy shell output by rewriting matching Bash commands in PreToolUse so they run through tokenix run before the agent sees the result. Filtering happens in three layers (highest priority first):

  1. Local project filters.toml files in .tokenix/filters/ inside the repo. Scoped to the project, committed to version control.
  2. User filters.toml files in ~/.tokenix/filters/. Apply to all projects, override bundled filters.
  3. Bundled filters — 72 RTK-compatible TOML filters shipped inside the binary, covering uv sync, cargo build, gradle, terraform plan, make, npm, poetry, docker, and more. Applied automatically — no setup needed.

Filter format

[filters.uv-sync]
description = "Compact uv sync output"
match_command = "^uv\\s+(sync|pip\\s+install)\\b"
strip_ansi = true
strip_lines_matching = ["^\\s*$", "^\\s+Downloading ", "^\\s+Using cached "]
match_output = [
  { pattern = "Audited \\d+ package", message = "ok (up to date)" },
]
max_lines = 20
on_empty = "uv: ok"
Field Description
match_command Rust regex matched against the full Bash command line
strip_ansi Remove ANSI colour codes before filtering
strip_lines_matching Drop lines matching any of these regex patterns
keep_lines_matching Keep only lines matching these patterns
match_output Short-circuit: if output matches pattern, return message immediately
max_lines / head_lines / tail_lines Truncate output
truncate_lines_at Truncate individual lines at N characters
on_empty Message to return when filtering produces empty output

AI-assisted filter generation

tokenix filter list                  # commands wasting the most tokens (no filter yet)
tokenix filter active                # all active user + bundled filters
tokenix filter record "cargo test"   # capture real output for richer generation
tokenix filter generate "cargo test" # generate a TOML filter via a local AI CLI

🏗 Architecture

src/
├── main.rs        CLI entry (clap), command dispatch, install-hook helpers
├── chunker.rs     Symbol-aware AST chunking (Tree-sitter) + dynamic language config (.tokenix.toml)
├── embed.rs       fastembed ONNX: embed_documents(), embed_query() — optional GPU via ort features
├── store.rs       SQLite schema, CRUD, FTS5, hybrid search, incremental branch fingerprint check
├── indexer.rs     File walker + incremental index pipeline (parallel chunking + batch embedding)
├── query.rs       Hybrid semantic + sparse FTS5 ranking, token-budget selection, result formatting
├── graph.rs       Symbol relationship graph + HTML export for vis.js output
├── hook.rs        PreToolUse handler — Claude-, Copilot-, and grep_search/run_in_terminal-style JSON input
├── daemon.rs      Background TCP server — holds model + in-memory embedding cache
├── compress.rs    Legacy PostToolUse compatibility pipeline for tool-output rewriting
├── filters.rs     FilterDef, load_local/user/bundled_filters(), priority merge, apply_filter()
├── cmd_filter.rs  `tokenix filter` subcommands (list, active, generate, record)
├── recordings.rs  Capture/replay real command output for filter generation
├── memory.rs      Global/project preference memory (editable Markdown)
├── gain.rs        Analytics from the hook log — per-model cost table
├── benchmark.rs   Reproducible savings + retrieval-quality benchmark
├── doctor.rs      Backend / GPU / model-cache / daemon diagnostics
└── mcp.rs         Model Context Protocol server

assets/
└── filters/       72 RTK-compatible TOML filters, embedded in the binary via rust-embed

GPU acceleration (opt-in)

A default build runs embeddings on CPU. Compile with a GPU feature to use the GPU — it then becomes the default at runtime, with automatic CPU fallback if the provider is unavailable:

# Windows — DirectML (works with any D3D12-capable GPU, no CUDA toolkit required)
cargo install --path . --features directml --locked

# Linux / Windows — CUDA (needs CUDA 12.x + cuDNN 9.x installed and on PATH;
# ort rc.9 does not support CUDA 13 yet)
cargo install --path . --features cuda --locked

On a GPU build, force CPU per-invocation with the global --only-cpu flag:

tokenix index .              # uses the GPU
tokenix --only-cpu index .   # forces CPU on a GPU build

--embed-batch drives peak memory (default 16 on CPU, 64 on GPU) — lower it if RAM/VRAM is tight. Run tokenix doctor to see the compiled backend, detected GPU, CUDA/cuDNN status, and tailored recommendations.

Daemon

The background daemon (tokenix serve) keeps the ONNX model and project embeddings in RAM. Hook calls route over TCP loopback instead of re-loading the model on each subprocess invocation, and it auto-starts on the first Grep hook call — you don't need to run it manually.

Embedding model

Property Value
Model nomic-embed-text-v1.5 (quantized)
Dimensions 768
File size ~130 MB
Cache location %LOCALAPPDATA%\tokenix\models (Windows) / ~/.cache/tokenix/models (Linux/macOS)
Download Automatic on first run
Runtime fastembed (ONNX Runtime, in-process)

Index storage lives at ~/.tokenix/<project-id>.db (one DB per project). Embeddings are stored as raw float32 blobs and cosine similarity is computed in Rust — no external vector database needed.


🔒 Security

tokenix's build and release pipeline is hardened against supply-chain attacks: SHA-pinned GitHub Actions, least-privilege workflow permissions, cargo-deny (advisories + license + crates.io-only sources), zizmor workflow analysis, OpenSSF Scorecard, SLSA build-provenance attestations, and tokenless crates.io publishing via OIDC. See SECURITY.md for the disclosure policy and release-verification steps.


🤝 Contributing

Contributions are welcome! See CONTRIBUTING.md for how to get started.


📄 License

MIT