VT Code

cargo install vtcode or brew install vinhnx/tap/vtcode (macOS) or npm install -g vtcode

VT Code is a Rust-based terminal coding agent with semantic code intelligence via Tree-sitter (parsers for Rust, Python, JavaScript/TypeScript, Go, Java) and ast-grep (structural pattern matching and refactoring).

It supports multiple LLM providers: OpenAI, Anthropic, xAI, DeepSeek, Gemini, OpenRouter, all with automatic failover, prompt caching, and token-efficient context management. Configuration occurs entirely through vtcode.toml, sourcing constants from vtcode-core/src/config/constants.rs and model IDs from docs/models.json to ensure reproducibility and avoid hardcoding.

Demo

Technical Motivation

VT Code addresses limitations in existing coding agents by prioritizing Rust's type safety, zero-cost abstractions, and async ecosystem for reliable, high-performance execution. Motivated by agentic AI research (e.g., Anthropic's context engineering principles), it integrates Tree-sitter for precise parsing and MCP for extensible tooling. This enables long-running sessions with maintained context integrity, error resilience, and minimal token overhead. Builds on foundational work like perg while incorporating lessons from OpenAI's codex-cli.

System Architecture

The architecture divides into vtcode-core (reusable library) and src/ (CLI executable), leveraging Tokio for multi-threaded async runtime (#[tokio::main(flavor = "multi_thread")] for CPU-intensive tasks), anyhow for contextual error propagation, and clap for derive-based CLI parsing. Key design tenets include atomic operations, metadata-driven tool calls (to optimize context tokens), and phase-aware context curation.

Core Components (`vtcode-core/`)

LLM Abstractions (llm/): Provider traits enable uniform async interfaces:
```
#[async_trait::async_trait]
pub trait Provider: Send + Sync {
    async fn complete(&self, prompt: &str) -> anyhow::Result<Completion>;
    fn supports_caching(&self) -> bool;
}
```
Features: Streaming responses, model-specific optimizations (e.g., Anthropic's cache_control: { ttl: "5m" } for 5-minute TTL; OpenAI's prompt_tokens_details.cached_tokens reporting ~40% savings). Tokenization via tiktoken-rs ensures accurate budgeting across models.
Modular Tools (tools/): Trait-based extensibility:
```
#[async_trait]
pub trait Tool: Send + Sync {
    fn name(&self) -> &'static str;
    fn description(&self) -> &'static str;
    async fn execute(&self, args: serde_json::Value) -> anyhow::Result<serde_json::Value>;
}
```
Built-ins include read_file (chunked at 2000 lines, metadata-first), ast_grep_search (operations: search/transform/lint/refactor with preview_only=true), and run_terminal_cmd (modes: terminal/pty/streaming; 30s timeout default). Git integration via list_files uses walkdir with ignore crate for .gitignore-aware traversal and nucleo-matcher for fuzzy scoring.
Configuration Engine (config/): Deserializes vtcode.toml into structs with validation:
```
[context.curation]
enabled = true
max_tokens_per_turn = 100000  # Enforce per-provider limits
phase_detection = true  # Auto-classify: exploration/implementation/etc.
```
Sections cover agents, tools (allow/deny), MCP (provider URLs), caching (quality_threshold=0.7), and safety (workspace_paths, max_file_size=1MB).
Context Engineering System: Implements iterative, per-turn curation based on conversation phase detection (e.g., exploration prioritizes search tools). Token budgeting: Real-time tracking with tiktoken-rs (~10μs/message), thresholds (0.75 warn/0.85 compact), and automatic summarization (LLM-driven, preserving decision ledger and errors; targets 30% compression ratio, saving ~29% tokens/turn). Decision ledger: Structured audit (Vec<DecisionEntry> with status: pending/in_progress/completed, confidence: 0-1). Error recovery: Pattern matching (e.g., parse failures) with fallback strategies and context preservation.
Code Intelligence: Tree-sitter integration for AST traversal (e.g., symbol resolution in tools/ast_grep_search); ast-grep for rule-based transforms:
```
# Example pattern in tool call
pattern: "fn $NAME($PARAMS) { $BODY }"
replacement: "async fn $NAME($PARAMS) -> Result<()> { $BODY.await }"
```
Supports preview mode to avoid destructive applies.
MCP Integration: Client uses official Rust SDK for protocol-compliant calls:
```
let client = McpClient::new("ws://localhost:8080");
let docs = client.call("get-library-docs", json!({
    "context7CompatibleLibraryID": "/tokio/docs",
    "tokens": 5000,
    "topic": "async runtime"
})).await?;
```
Discovers tools dynamically (e.g., mcp_resolve-library-id for Context7 IDs, mcp_sequentialthinking for chain-of-thought reasoning with branch/revision support, mcp_get_current_time for timezone-aware ops). Connection pooling and failover for multi-provider setups.

CLI Execution (`src/`)

User Interface: Ratatui for reactive TUI (mouse-enabled, ANSI escape sequences for colors: e.g., \x1b[34m for blue tool banners). Real-time PTY via vte crate for command streaming; slash commands parsed with fuzzy matching.
Runtime: Tokio executor handles concurrent tool calls; human-in-the-loop via confirmation prompts for high-risk ops (e.g., rm -rf denials).
Observability: Logs to file/console with structured format; metrics (e.g., cache hit rate, token usage) exposed via debug flags.

Performance notes: Multi-threaded Tokio reduces latency for I/O-bound tasks (~20% faster than single-thread); context compression yields 50-80% token savings in long sessions. See docs/ARCHITECTURE.md for dependency graph and profiling data.

Key Capabilities

LLM Orchestration: Failover logic (e.g., Gemini primary, OpenAI fallback); reasoning control (low/medium/high effort via provider params); caching with quality gating (cache only >70% confidence, TTL=30 days).
Code Analysis & Editing: Semantic search (AST-grep similarity mode, threshold=0.7); targeted edits (exact string match in edit_file, preserving whitespace); multi-file patches via apply_patch.
Context & Session Management: Phase-adaptive tool selection (e.g., validation phase favors run_terminal_cmd with cargo test); ledger injection for coherence (max 12 entries); summarization triggers at 20 turns or 85% budget.
Extensibility: Custom tools via trait impls; MCP for domain-specific extensions (e.g., library docs resolution: resolve-library-id → get-library-docs with max_tokens=5000).
Security Posture: Path validation (no escapes outside WORKSPACE_DIR); sandboxed network (curl HTTPS only, no localhost); allowlists (e.g., deny rm, permit cargo); env-var secrets (no file storage).

Installation and Initialization

Binaries on GitHub Releases support macOS (aarch64/x86_64-apple-darwin), Linux (x86_64/aarch64-unknown-linux-gnu), Windows (x86_64-pc-windows-msvc).

Initialize environment:

# API keys (required for providers)
export OPENAI_API_KEY="your_api_key"  # Validate via curl test
# [RECOMMEND] use dot file .env in your project root: OPENAI_API_KEY=sk-...

# Workspace setup
cd my-project
cp /path/to/vtcode.toml.example .vtcode/vtcode.toml  # Customize [mcp.enabled=true] etc.

Execution:

vtcode  # Launch TUI (loads vtcode.toml defaults)
vtcode --provider openai --model gpt-5-codex ask "Refactor async fn in src/lib.rs using ast-grep"  # Tool-enabled query
vtcode --debug --no-tools ask "Compute token budget for current context"  # Dry-run analysis

Configuration validation: On load, checks TOML against schema (e.g., model in docs/models.json); logs warnings for deprecated keys.

Command-Line Interface

Parsed via clap derives:

vtcode [OPTIONS] <SUBCOMMAND>

SUBCOMMANDS:
    ask <QUERY>     Non-interactive query with optional tools
    OPTIONS:
        --provider <PROVIDER>    LLM provider (default: from config)
        --model <MODEL>          Model ID (e.g., gemini-1.5-pro; from constants.rs)
        --no-tools               Disable tool execution (analysis only)
        --debug                  Enable verbose logging and metrics
        --help                   Print usage

Development runs: ./run.sh (release profile: lto=true, codegen-units=1 for opt); ./run-debug.sh (debug symbols, hot-reload via cargo-watch).

Development Practices

Validation Pipeline: cargo check for incremental builds; cargo clippy with custom lints (e.g., no unsafe, prefer ? over unwrap); cargo fmt --check for style enforcement (4-space indents, max_width=100).
Testing: Unit tests in #[cfg(test)] modules; integration in tests/ (e.g., mock LLM responses with wiremock); coverage via cargo tarpaulin. Property testing for tools (e.g., proptest for path sanitization).

Error Handling: Uniform anyhow::Result<T> with context:

fn load_config(path: &Path) -> anyhow::Result<Config> {
    toml::from_str(&fs::read_to_string(path)?)
        .with_context(|| format!("Invalid TOML in {}", path.display()))?
}

Benchmarking: Criterion in benches/ (e.g., token counting: <50μs for 1k tokens); flamegraphs for async bottlenecks.
Documentation: Rustdoc for public APIs; Markdown in ./docs/ (e.g., context engineering impl). No root-level Markdown except README.

Adhere to CONTRIBUTING.md: Conventional commits, PR templates with benchmarks.

References

User Guides: Getting Started, Configuration.
Technical Docs: Context Engineering, MCP Setup, Prompt Caching.
API: vtcode-core (full crate docs).
Changelog: CHANGELOG.

License

MIT License - LICENSE for full terms.

vtcode-core 0.18.2