syntext
A faster grep for agentic AI. ~20X faster than ripgrep when indexed.
Hybrid code search index for agent workflows, built in Rust. Indexes repositories using sparse n-grams, then narrows to a small candidate set before verification. Drop-in replacement for rg in AI agent loops where grep is called repeatedly and in parallel.
Status: stable (v1.0).
Installation
Quick install (macOS and Linux)
|
Installs st to /usr/local/bin. On macOS, uses Homebrew cask if brew is available. On Debian/Ubuntu (x86_64), installs the .deb package. All other Linux targets get the raw binary. Checksums are verified against SHA256SUMS from the release.
Override defaults with environment variables:
INSTALL_DIR=/.local/bin SYNTEXT_VERSION=1.0.1 \
|
VERSION=1.0.1
# Debian/Ubuntu (x86_64)
# Any Linux (x86_64 or arm64)
ARCH=amd64 # or arm64
&&
From source
Benchmarks
Search latency across five real-world repositories (v1.0, macOS, Apple Silicon).
| Repo | st avg |
rg avg |
grep avg |
Speedup vs rg |
|---|---|---|---|---|
| React | 20.7 ms |
112.9 ms |
314.3 ms |
5.5x |
| Rust compiler | 99.9 ms |
2183.2 ms |
2412.8 ms |
21.9x |
| TypeScript | 111.9 ms |
3093.8 ms |
3171.8 ms |
27.7x |
| Node.js | 69.5 ms |
1492.6 ms |
3186.4 ms |
21.5x |
| Linux kernel | 154.5 ms |
3681.3 ms |
n/a | 23.8x |
Average speedup across five presets: 20.1x versus rg. Search time excludes index build time.
See docs/BENCHMARKS.md for methodology, index build times, query discipline, and historical runs.
Usage
# Build the index (run once per repo, then only after large changes)
# Index is stored in .syntext/ at the repo root (nearest .git ancestor).
# Not run automatically -- you must run this before the first search.
# Override where the index is stored or which root to index
# After editing files, sync the index incrementally (faster than full rebuild)
# Search the whole repo (index must exist)
# Restrict search scope with positional paths
# Additional filters and output modes
# Status
Notes:
- Search is the default command, there is no
st searchsubcommand. - Like ripgrep, file names are shown by default when searching a directory, the whole repo, or multiple positional paths.
- Like ripgrep, line numbers are off by default when stdout is not a TTY. Use
-nto force them on.
Agent configuration
To tell an AI agent to use st instead of rg or grep, add the following to your CLAUDE.md, AGENTS.md, or equivalent agent instruction file. The key constraint: check for the index once, not on every search.
Use `st` instead of `rg` or `grep` for all code searches. `st` is a
drop-in replacement for ripgrep: same flags, identical output, but searches
a pre-built index and is significantly faster on repeated queries.
Before the first search in a session, check whether the index exists:
Do not check for the index on every search. Once built, assume it is valid
for the session. If files change mid-task, run `st update` to sync
incrementally instead of rebuilding.
Common usage (same flags as rg):
Architecture
Query -> Router -> [Literal | Indexed Regex | Full Scan]
|
Gram extraction
|
Posting list intersection (smallest-first)
|
Candidate file IDs
|
Verifier (memchr or regex against file content)
|
Results
Three index components:
- Content index: sparse n-gram posting lists. Trigram augmentation ensures no false negatives for token-aligned queries.
- Path index: Roaring bitmap component sets for path/type filtering.
- Symbol index (optional): Tree-sitter extraction into SQLite.
Segments are immutable single-file mmap structures (SNTX format). Updates go through an in-memory overlay with atomic batch commit via ArcSwap.
See docs/ARCHITECTURE.md for the full quantitative analysis: selectivity math, index size estimates, posting list encoding tradeoffs.
WASM
The wasm Cargo feature compiles syntext to a fully in-memory index with no filesystem access. See the releases page for prebuilt syntext-wasm-<version>.tar.gz, or build from source:
# output: pkg/ (JS glue + .wasm + TypeScript types)
Project status
All phases complete (v1.0). Core st index && st "pattern" workflow validated against ripgrep. Symbol search available behind --features symbols.
| Phase | Status | What it delivers |
|---|---|---|
| 1. Setup | Complete | Cargo project, dependencies, module structure |
| 2. Foundational | Complete | Weight table, tokenizer, posting lists, correctness harness |
| 3. US5 -- Build | Complete | Full index build from scratch |
| 4. US1 -- Search | Complete | Literal + regex search, ripgrep correctness validation |
| 5. US2 -- Incremental | Complete | Overlay, batch commit, read-your-writes |
| 6. US3 -- Path scoping | Complete | Path/type filters with Roaring bitmaps |
| 7. US4 -- Symbols | Complete | Tree-sitter symbol extraction, SQLite storage |
| 8. CLI | Complete | st binary with grep-compatible output |
| 9. Polish | Complete | Bug fixes, security hardening, benchmarks, documentation |
Known limitations
- Crash recovery: Overlay state is lost on unclean shutdown. Run
st updateorst indexafter a crash. - Invert match scope:
st -vinverts within candidate files only, not the full corpus. - Non-aligned substring coverage: ~16% false-negative rate for queries that don't align with token boundaries. Token-aligned queries (identifiers, keywords) have 0% false negatives.
- Network filesystems: Index directory must be on local filesystem. NFS/SMB behavior is undefined.
- Case-insensitive overhead: ~15-20% more candidates due to lowercase normalization. Correct results guaranteed by verifier.
\r-only line endings: Treated as a single line (matches ripgrep behavior).- Symbol search accuracy: Tier 3 (heuristic) results are approximate. Tree-sitter failures fall back silently.
- One root per index: Each index covers exactly one
--repo-root. There is no way to merge multiple directories into a single index. To search across two repos, build and query each index separately with--repo-root.st updaterequires a git repo; non-git directories must be re-indexed withst index.
Design documents
- docs/ARCHITECTURE.md -- Quantitative analysis: selectivity math, index size estimates, posting list encoding, design tradeoffs
- specs/001-hybrid-code-search-index/spec.md -- Feature specification with user stories and acceptance criteria
- specs/001-hybrid-code-search-index/research.md -- 19-section architecture research covering every subsystem
- specs/001-hybrid-code-search-index/data-model.md -- Entity definitions and relationships
- specs/001-hybrid-code-search-index/contracts/ -- Library API, CLI, and segment format contracts
- specs/001-hybrid-code-search-index/tasks.md -- Implementation plan with dependency graph
License
MIT