rawq
Context retrieval engine for AI agents.
Semantic + lexical search over codebases. Single Rust binary. Fully offline. Built for AI agents.
Why
AI agents waste tokens reading irrelevant files. rawq returns only the relevant code — with file paths, line ranges, scope names, and confidence scores. Searching a 10k-file codebase yields 5-10 relevant chunks instead of 50+ full files.
Install
Quick install (prebuilt binary, auto-adds to PATH):
# macOS / Linux
|
# Windows (PowerShell)
powershell -ExecutionPolicy Bypass -c "irm https://raw.githubusercontent.com/auyelbekov/rawq/main/scripts/install.ps1 | iex"
Or download manually from GitHub Releases.
Cargo (requires Rust toolchain):
GPU acceleration — prebuilt binaries include GPU support. For cargo installs, enable with feature flags:
Build from source:
Quick start
# Search a codebase (auto-downloads snowflake-arctic-embed-s + indexes on first run)
# Structured JSON output
# Lexical BM25 only
# Semantic only
Search output
src/db/connection.py:23-41 [91%] DatabaseClient.reconnect
23 | def reconnect(self, max_retries=3):
24 | """Attempt to re-establish database connection"""
25 | for attempt in range(max_retries):
With --json:
Features
- Hybrid search — RRF-fused semantic (ONNX embeddings) + lexical (tantivy BM25) with adaptive query weighting
- 16 languages — tree-sitter AST chunking for Rust, Python, TypeScript, JavaScript, Go, Java, C, C++, C#, Ruby, PHP, Swift, Bash, Lua, Scala, Dart
- Universal fallback — any text file automatically indexed with its real extension as the language label (
.sql,.yaml,.proto,.tf, etc.) - Incremental indexing — SHA-256 per chunk, git-aware change detection, sub-second re-index
- Fully offline — ONNX Runtime inference, no network calls after initial model download
- Agent-friendly —
--json,--stream(NDJSON),--token-budget, exit codes (0=found, 1=none, 2=error) - GPU acceleration — auto-detects best GPU and computes batch sizes from actual VRAM. DirectML, CUDA, CoreML with automatic CPU fallback
- Daemon mode — holds ONNX model hot in background, auto-starts on first search, auto-exits after 30min idle
- Diff-scoped search —
rawq diff "query"searches only within the current git diff - Re-ranking —
--rerankapplies keyword overlap heuristic for two-pass result ordering - Codebase map —
rawq mapshows AST-based structure with real hierarchy (impl > methods) - Terminal UX — syntax highlighting via
bat, paged output, context lines around matches
Commands
Models
rawq auto-downloads the default model on first use. Available models:
| Model | Dimensions | Sequence Length | Notes |
|---|---|---|---|
| snowflake-arctic-embed-s | 384 | 512 | Default. Small, fast. |
| snowflake-arctic-embed-m-v1.5 | 768 | 512 | Recommended. Better quality. |
| jina-embeddings-v2-base-code | 768 | 8192 | Code-specialized, long context. |
Switch models with rawq model download <name> and rawq model default <name>.
Environment variables
| Variable | Description |
|---|---|
RAWQ_MODEL |
Override default model |
RAWQ_NO_GPU |
Force CPU mode (=1) |
RAWQ_NO_DAEMON |
Disable daemon (=1) |
RAWQ_NO_BAT |
Disable syntax highlighting (=1) |
RAWQ_NO_PAGER |
Disable paged output (=1) |
RAWQ_OFFLINE |
Skip network calls (=1) |
RAWQ_DML_DEVICE |
DirectML device index |
RAWQ_CUDA_DEVICE |
CUDA device index |
RAWQ_VRAM_BUDGET |
Override VRAM budget (bytes) |
AI agent usage
Set SKILL.md as context for your AI agent to teach it how to use rawq effectively — query strategies, filtering options, and common patterns.