ck - Semantic Code Search
ck (seek) finds code by meaning, not just keywords. It's grep that understands what you're looking for โ search for "error handling" and find try/catch blocks, error returns, and exception handling code even when those exact words aren't present.
๐ Quick Start
# Install from crates.io
# Just search โ ck builds and updates indexes automatically
# Traditional grep-compatible search still works
# Combine both: semantic relevance + keyword filtering
๐ Full Documentation โ Installation guides, tutorials, feature deep-dives, and API reference
โจ Headline Features
๐ค AI Agent Integration (MCP Server)
Connect ck directly to Claude Desktop, Cursor, or any MCP-compatible AI client for seamless code search integration:
# Start MCP server for AI agent integration
Claude Desktop Setup:
# Install via Claude Code CLI (recommended)
# Note: You may need to restart Claude Code after installation
# Verify installation with:
Manual Configuration (alternative):
Tool Permissions: When prompted by Claude Code, approve permissions for ck-search tools (semantic_search, regex_search, hybrid_search, etc.)
Available MCP Tools:
semantic_search- Find code by meaning using embeddingsregex_search- Traditional grep-style pattern matchinghybrid_search- Combined semantic and keyword searchindex_status- Check indexing status and metadatareindex- Force rebuild of search indexhealth_check- Server status and diagnostics
Built-in Pagination: Handles large result sets gracefully with page_size controls, cursors, and snippet length management.
๐จ Interactive TUI (Terminal User Interface)
Launch an interactive search interface with real-time results and multiple preview modes:
# Start TUI for current directory
# Start with initial query
Features:
- Multiple Search Modes: Toggle between Semantic, Regex, and Hybrid search with
Tab - Preview Modes: Switch between Heatmap, Syntax highlighting, and Chunk view with
Ctrl+V - View Options: Toggle between snippet and full-file view with
Ctrl+F - Multi-select: Select multiple files with
Ctrl+Space, open all in editor withEnter - Search History: Navigate with
Ctrl+Up/Down - Editor Integration: Opens files in
$EDITORwith line numbers (Vim, VS Code, Cursor, etc.) - Progress Tracking: Live indexing progress with file and chunk counts
- Config Persistence: Preferences saved to
~/.config/ck/tui.json
See TUI.md for keyboard shortcuts and detailed usage.
๐ Semantic Search
Find code by concept, not keywords. Understands synonyms, related terms, and conceptual similarity:
# These find related code even without exact keywords:
# Get complete functions/classes containing matches
โก Drop-in grep Compatibility
All your muscle memory works. Same flags, same behavior, same output format:
๐ฏ Hybrid Search
Combine keyword precision with semantic understanding using Reciprocal Rank Fusion:
โ๏ธ Automatic Delta Indexing with Chunk-Level Caching
Semantic and hybrid searches transparently create and refresh their indexes before running. The first search builds what it needs; subsequent searches intelligently reuse cached embeddings:
- Chunk-level incremental indexing: Only changed chunks are re-embedded (80-90% cache hit rate for typical code changes)
- Content-aware invalidation: Doc comments and whitespace changes properly invalidate cache
- Model consistency: Prevents silent embedding corruption when switching models
- Smart caching: Hash-based invalidation using blake3(text + trivia) for reliable change detection
๐ Smart File Filtering
Automatically excludes cache directories, build artifacts, and respects .gitignore and .ckignore files:
# ck respects multiple exclusion layers (all are additive):
# .ckignore file (created automatically on first index):
# - Excludes images, videos, audio, binaries, archives by default
# - Excludes JSON/YAML config files (issue #27)
# - Uses same syntax as .gitignore (glob patterns, ! for negation)
# - Persists across searches (issue #67)
# - Located at repository root, editable for custom patterns
# Exclusion patterns use .gitignore syntax:
# Note: Patterns are relative to the search root
Why .ckignore? While .gitignore handles version control exclusions, many files that should be in your repo aren't ideal for semantic search. Config files (package.json, tsconfig.json), images, videos, and data files add noise to search results and slow down indexing. .ckignore lets you focus semantic search on actual code while keeping everything else in git. Think of it as "what should I search" vs "what should I commit".
๐ Advanced Usage
AI Agent Integration
MCP Server (Recommended)
# Example usage in AI agents
= await
# Handle pagination
= await
JSONL Output (Custom Workflows)
Perfect structured output for LLMs, scripts, and automation:
# JSONL format - one JSON object per line (recommended for agents)
# Traditional JSON (single array)
|
Why JSONL for AI agents?
- โ Streaming friendly: Process results as they arrive
- โ Memory efficient: Parse one result at a time
- โ Error resilient: One malformed line doesn't break entire response
- โ Standard format: Used by OpenAI API, Anthropic API, and modern ML pipelines
Search & Filter Options
# Threshold filtering
# Limit results
# Complete code sections
# Relevance scoring
# [0.847] ./ai_guide.txt: Machine learning introduction...
# [0.732] ./statistics.txt: Statistical learning methods...
Language Coverage
| Language | Indexing | Chunking | AST-aware | Notes |
|---|---|---|---|---|
| Zig | โ | โ | โ | contributed by @Nevon (PR #72) |
Model Selection
Choose the right embedding model for your needs:
# Default: BGE-Small (fast, precise chunking)
# Mixedbread xsmall: Optimized for local semantic search (4K context, 384 dims)
# Enhanced: Nomic V1.5 (8K context, optimal for large functions)
# Code-specialized: Jina Code (optimized for programming languages)
Model Comparison:
bge-small(default): 400-token chunks, fast indexing, good for most codemxbai-xsmall: 4K context window, 384 dimensions, optimized for local inference (Mixedbread)nomic-v1.5: 1024-token chunks with 8K model capacity, better for large functionsjina-code: 1024-token chunks with 8K model capacity, specialized for code understanding
Index Management
# Check index status
# Clean up and rebuild / switch models
# Add single file to index
# File inspection (analyze chunking and token usage)
Interrupting Operations: Indexing can be safely interrupted with Ctrl+C. The partial index is saved, and the next operation will resume from where it stopped, only processing new or changed files.
๐ Language Support
| Language | Indexing | Tree-sitter Parsing | Semantic Chunking |
|---|---|---|---|
| Python | โ | โ | โ Functions, classes |
| JavaScript/TypeScript | โ | โ | โ Functions, classes, methods |
| Rust | โ | โ | โ Functions, structs, traits |
| Go | โ | โ | โ Functions, types, methods |
| Ruby | โ | โ | โ Classes, methods, modules |
| Haskell | โ | โ | โ Functions, types, instances |
| C# | โ | โ | โ Classes, interfaces, methods |
| Dart | โ | โ | โ Classes, mixins, methods |
Text Formats: Markdown, JSON, YAML, TOML, XML, HTML, CSS, shell scripts, SQL, log files, config files, and any other text format.
Smart Binary Detection: Uses ripgrep-style content analysis, automatically indexing any text file while correctly excluding binary files.
Unsupported File Types: Text files with unrecognized extensions (like .org, .adoc, etc.) are automatically indexed as plain text. ck detects text vs binary based on file contents, not extensions.
๐ Installation
From crates.io
From Source
Package Managers
# Currently available:
# Coming soon:
๐ก Examples
Finding Code Patterns
# Find authentication/authorization code
# Find error handling strategies
# Find performance-related code
Team Workflows
# Find related test files
# Identify refactoring candidates
# Security audit
Integration Examples
# Git hooks
|
# CI/CD pipeline
|
# Code review prep
# Documentation generation
|
โก Performance
Field-tested on real codebases:
- Indexing: ~1M LOC in under 2 minutes
- Incremental indexing: 80-90% cache hit rate for typical code changes (only changed chunks re-embedded)
- Search: Sub-500ms queries on typical codebases
- Index size: ~2x source code size with compression
- Memory: Efficient streaming for large repositories
- Token precision: HuggingFace tokenizers for exact model-specific token counting
๐ง Architecture
ck uses a modular Rust workspace:
ck-cli- Command-line interface and MCP serverck-tui- Interactive terminal user interface (ratatui-based)ck-core- Shared types, configuration, and utilitiesck-engine- Search engine implementations (regex, semantic, hybrid)ck-index- File indexing, hashing, and sidecar managementck-embed- Text embedding providers (FastEmbed, API backends)ck-ann- Approximate nearest neighbor search indicesck-chunk- Text segmentation and language-aware parsing (query-based chunking)ck-models- Model registry and configuration management
Index Storage
Indexes are stored in .ck/ directories alongside your code:
project/
โโโ src/
โโโ docs/
โโโ .ck/ # Semantic index (can be safely deleted)
โโโ embeddings.json
โโโ ann_index.bin
โโโ tantivy_index/
The .ck/ directory is a cache โ safe to delete and rebuild anytime.
๐งช Testing
# Run the full test suite
# Test with each feature combination
๐ค Contributing
ck is actively developed and welcomes contributions:
- Issues: Report bugs, request features
- Code: Submit PRs for bug fixes, new features
- Documentation: Improve examples, guides, tutorials
- Testing: Help test on different codebases and languages
Development Setup
CI Requirements
Before submitting a PR, ensure your code passes all CI checks:
# Format code (required)
# Run clippy linter (required - must have no warnings)
# Run tests (required)
# Check minimum supported Rust version (MSRV)
The CI pipeline runs on Ubuntu, Windows, and macOS to ensure cross-platform compatibility.
๐บ Roadmap
Current (v0.7+)
- โ MCP (Model Context Protocol) server for AI agent integration
- โ Chunk-level incremental indexing with smart embedding reuse
- โ grep-compatible CLI with semantic search and file listing flags
- โ FastEmbed integration with BGE models and enhanced model selection
- โ File exclusion patterns and glob support
- โ Threshold filtering and relevance scoring with visual highlighting
- โ Tree-sitter parsing and intelligent chunking for 7+ languages
- โ
Complete code section extraction (
--full-section) - โ Clean stdout/stderr separation for reliable scripting
- โ Token-aware chunking with HuggingFace tokenizers
- โ
Published to crates.io (
cargo install ck-search)
Next (v0.6+)
- ๐ง Configuration file support
- ๐ง Package manager distributions (brew, apt)
- ๐ง Enhanced MCP tools (file writing, refactoring assistance)
- ๐ง VS Code extension
- ๐ง JetBrains plugin
- ๐ง Additional language chunkers (Java, PHP, Swift)
โ FAQ
Q: How is this different from grep/ripgrep/silver-searcher? A: ck includes all the features of traditional search tools, but adds semantic understanding. Search for "error handling" and find relevant code even when those exact words aren't used.
Q: Does it work offline? A: Yes, completely offline. The embedding model runs locally with no network calls.
Q: How big are the indexes?
A: Typically 1-3x the size of your source code. The .ck/ directory can be safely deleted to reclaim space.
Q: Is it fast enough for large codebases? A: Yes. The first semantic search builds the index automatically; after that only changed files are reprocessed, keeping searches sub-second even on large projects.
Q: Can I use it in scripts/automation?
A: Absolutely. The --json and --jsonl flags provide structured output perfect for automated processing and AI agent integration.
Q: What about privacy/security? A: Everything runs locally. No code or queries are sent to external services. The embedding model is downloaded once and cached locally.
Q: Where are the embedding models cached? A: Models are cached in platform-specific directories:
- Linux/macOS:
~/.cache/ck/models/ - Windows:
%LOCALAPPDATA%\ck\cache\models\ - Fallback:
.ck_models/models/in current directory
๐ License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.
๐ Credits
Built with:
- Rust - Systems programming language
- FastEmbed - Fast text embeddings
- Tantivy - Full-text search engine
- clap - Command line argument parsing
Inspired by the need for better code search tools in the age of AI-assisted development.
Start finding code by what it does, not what it says.