ck - Semantic Code Search
ck (seek) finds code by meaning, not just keywords. It's grep that understands what you're looking for โ search for "error handling" and find try/catch blocks, error returns, and exception handling code even when those exact words aren't present.
๐ Quick Start
# Install from crates.io
# Just search โ ck builds and updates indexes automatically
# Traditional grep-compatible search still works
# Combine both: semantic relevance + keyword filtering
โจ Headline Features
๐ค AI Agent Integration (MCP Server)
Connect ck directly to Claude Desktop, Cursor, or any MCP-compatible AI client for seamless code search integration:
# Start MCP server for AI agent integration
Claude Desktop Setup:
# Install via Claude Code CLI (recommended)
# Note: You may need to restart Claude Code after installation
# Verify installation with:
Codex Setup:
# Install via Codex CLI
# Verify installation
Manual Configuration:
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
Codex (~/.config/codex/config.toml):
# IMPORTANT: the top-level key is `mcp_servers` rather than `mcpServers`
[]
= "ck"
= ["--serve"]
# Optional: override the default 10s startup timeout
= 20_000
Tool Permissions: When prompted by Claude Code, approve permissions for ck-search tools (semantic_search, regex_search, hybrid_search, etc.)
Available MCP Tools:
semantic_search- Find code by meaning using embeddingsregex_search- Traditional grep-style pattern matchinghybrid_search- Combined semantic and keyword searchindex_status- Check indexing status and metadatareindex- Force rebuild of search indexhealth_check- Server status and diagnostics
Built-in Pagination: Handles large result sets gracefully with page_size controls, cursors, and snippet length management.
๐ Semantic Search
Find code by concept, not keywords. Understands synonyms, related terms, and conceptual similarity:
# These find related code even without exact keywords:
# Get complete functions/classes containing matches
โก Drop-in grep Compatibility
All your muscle memory works. Same flags, same behavior, same output format:
๐ฏ Hybrid Search
Combine keyword precision with semantic understanding using Reciprocal Rank Fusion:
โ๏ธ Automatic Delta Indexing
Semantic and hybrid searches transparently create and refresh their indexes before running. The first search builds what it needs; subsequent searches only touch files that changed.
๐ Smart File Filtering
Automatically excludes cache directories, build artifacts, and respects .gitignore files:
# Exclusion patterns use .gitignore syntax:
# Note: Patterns are relative to the search root
๐ Advanced Usage
AI Agent Integration
MCP Server (Recommended)
# Example usage in AI agents
= await
# Handle pagination
= await
JSONL Output (Custom Workflows)
Perfect structured output for LLMs, scripts, and automation:
# JSONL format - one JSON object per line (recommended for agents)
# Traditional JSON (single array)
|
Why JSONL for AI agents?
- โ Streaming friendly: Process results as they arrive
- โ Memory efficient: Parse one result at a time
- โ Error resilient: One malformed line doesn't break entire response
- โ Standard format: Used by OpenAI API, Anthropic API, and modern ML pipelines
Search & Filter Options
# Threshold filtering
# Limit results
# Complete code sections
# Relevance scoring
# [0.847] ./ai_guide.txt: Machine learning introduction...
# [0.732] ./statistics.txt: Statistical learning methods...
Model Selection
Choose the right embedding model for your needs:
# Default: BGE-Small (fast, precise chunking)
# Enhanced: Nomic V1.5 (8K context, optimal for large functions)
# Code-specialized: Jina Code (optimized for programming languages)
Model Comparison:
bge-small(default): 400-token chunks, fast indexing, good for most codenomic-v1.5: 1024-token chunks with 8K model capacity, better for large functionsjina-code: 1024-token chunks with 8K model capacity, specialized for code understanding
Index Management
# Check index status
# Clean up and rebuild / switch models
# Add single file to index
# File inspection (analyze chunking and token usage)
Interrupting Operations: Indexing can be safely interrupted with Ctrl+C. The partial index is saved, and the next operation will resume from where it stopped, only processing new or changed files.
๐ Language Support
| Language | Indexing | Tree-sitter Parsing | Semantic Chunking |
|---|---|---|---|
| Python | โ | โ | โ Functions, classes |
| JavaScript/TypeScript | โ | โ | โ Functions, classes, methods |
| Rust | โ | โ | โ Functions, structs, traits |
| Go | โ | โ | โ Functions, types, methods |
| Ruby | โ | โ | โ Classes, methods, modules |
| Haskell | โ | โ | โ Functions, types, instances |
| C# | โ | โ | โ Classes, interfaces, methods |
Text Formats: Markdown, JSON, YAML, TOML, XML, HTML, CSS, shell scripts, SQL, log files, config files, and any other text format.
Smart Binary Detection: Uses ripgrep-style content analysis, automatically indexing any text file while correctly excluding binary files.
Unsupported File Types: Text files with unrecognized extensions (like .org, .adoc, etc.) are automatically indexed as plain text. ck detects text vs binary based on file contents, not extensions.
๐ Installation
From crates.io
From Source
Package Managers
# Currently available:
# Coming soon:
๐ก Examples
Finding Code Patterns
# Find authentication/authorization code
# Find error handling strategies
# Find performance-related code
Team Workflows
# Find related test files
# Identify refactoring candidates
# Security audit
Integration Examples
# Git hooks
|
# CI/CD pipeline
|
# Code review prep
# Documentation generation
|
โก Performance
Field-tested on real codebases:
- Indexing: ~1M LOC in under 2 minutes
- Search: Sub-500ms queries on typical codebases
- Index size: ~2x source code size with compression
- Memory: Efficient streaming for large repositories
- Token precision: HuggingFace tokenizers for exact model-specific token counting
๐ง Architecture
ck uses a modular Rust workspace:
ck-cli- Command-line interface and MCP serverck-core- Shared types, configuration, and utilitiesck-engine- Search engine implementations (regex, semantic, hybrid)ck-index- File indexing, hashing, and sidecar managementck-embed- Text embedding providers (FastEmbed, API backends)ck-ann- Approximate nearest neighbor search indicesck-chunk- Text segmentation and language-aware parsingck-models- Model registry and configuration management
Index Storage
Indexes are stored in .ck/ directories alongside your code:
project/
โโโ src/
โโโ docs/
โโโ .ck/ # Semantic index (can be safely deleted)
โโโ embeddings.json
โโโ ann_index.bin
โโโ tantivy_index/
The .ck/ directory is a cache โ safe to delete and rebuild anytime.
๐งช Testing
# Run the full test suite
# Test with each feature combination
๐ค Contributing
ck is actively developed and welcomes contributions:
- Issues: Report bugs, request features
- Code: Submit PRs for bug fixes, new features
- Documentation: Improve examples, guides, tutorials
- Testing: Help test on different codebases and languages
Development Setup
CI Requirements
Before submitting a PR, ensure your code passes all CI checks:
# Format code (required)
# Run clippy linter (required - must have no warnings)
# Run tests (required)
# Check minimum supported Rust version (MSRV)
The CI pipeline runs on Ubuntu, Windows, and macOS to ensure cross-platform compatibility.
๐บ Roadmap
Current (v0.5+)
- โ MCP (Model Context Protocol) server for AI agent integration
- โ grep-compatible CLI with semantic search and file listing flags
- โ FastEmbed integration with BGE models and enhanced model selection
- โ File exclusion patterns and glob support
- โ Threshold filtering and relevance scoring with visual highlighting
- โ Tree-sitter parsing and intelligent chunking for 7+ languages
- โ
Complete code section extraction (
--full-section) - โ Clean stdout/stderr separation for reliable scripting
- โ Incremental index updates with hash-based change detection
- โ Token-aware chunking with HuggingFace tokenizers
- โ
Published to crates.io (
cargo install ck-search)
Next (v0.6+)
- ๐ง Configuration file support
- ๐ง Package manager distributions (brew, apt)
- ๐ง Enhanced MCP tools (file writing, refactoring assistance)
- ๐ง VS Code extension
- ๐ง JetBrains plugin
- ๐ง Additional language chunkers (Java, PHP, Swift)
โ FAQ
Q: How is this different from grep/ripgrep/silver-searcher? A: ck includes all the features of traditional search tools, but adds semantic understanding. Search for "error handling" and find relevant code even when those exact words aren't used.
Q: Does it work offline? A: Yes, completely offline. The embedding model runs locally with no network calls.
Q: How big are the indexes?
A: Typically 1-3x the size of your source code. The .ck/ directory can be safely deleted to reclaim space.
Q: Is it fast enough for large codebases? A: Yes. The first semantic search builds the index automatically; after that only changed files are reprocessed, keeping searches sub-second even on large projects.
Q: Can I use it in scripts/automation?
A: Absolutely. The --json and --jsonl flags provide structured output perfect for automated processing and AI agent integration.
Q: What about privacy/security? A: Everything runs locally. No code or queries are sent to external services. The embedding model is downloaded once and cached locally.
Q: Where are the embedding models cached? A: Models are cached in platform-specific directories:
- Linux/macOS:
~/.cache/ck/models/ - Windows:
%LOCALAPPDATA%\ck\cache\models\ - Fallback:
.ck_models/models/in current directory
๐ License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.
๐ Credits
Built with:
- Rust - Systems programming language
- FastEmbed - Fast text embeddings
- Tantivy - Full-text search engine
- clap - Command line argument parsing
Inspired by the need for better code search tools in the age of AI-assisted development.
Start finding code by what it does, not what it says.