KODEGEN.ᴀɪ Candle Agent
Memory-efficient, blazing-fast MCP tools for code generation agents.
A high-performance Model Context Protocol (MCP) server that provides cognitive memory capabilities for AI agents. Built with Rust and the Candle ML framework, it delivers semantic memory storage, retrieval, and quantum-inspired routing for intelligent code generation workflows.
Features
- 🧠 Cognitive Memory System - Store and retrieve code context with semantic understanding
- ⚡ High-Performance - Rust + SIMD optimizations for blazing-fast embeddings and retrieval
- 🔄 Async Operations - Non-blocking memory ingestion with progress tracking
- 🎯 Multiple Retrieval Strategies - Semantic, temporal, and hybrid search modes
- 🌊 Quantum-Inspired Routing - Advanced memory importance scoring with entanglement
- 📊 Vector Storage - Support for FAISS, HNSW, and instant-distance backends
- 💾 Persistent Storage - SurrealDB with embedded SurrealKV for ACID transactions
- 🚀 Hardware Acceleration - CUDA, Metal, MKL, and Accelerate support
- 🔧 MCP Compatible - Works with Claude Desktop, Cline, and other MCP clients
Quick Start
Prerequisites
- Rust nightly toolchain (automatically configured via
rust-toolchain.toml) - For GPU acceleration:
- CUDA 12+ (NVIDIA GPUs)
- Metal (Apple Silicon)
- MKL (Intel CPUs)
Installation
# Clone the repository
# Build with default features
# Or build with hardware acceleration
Running the Server
# Start the MCP server (HTTP transport)
# The server will start on http://localhost:3000 by default
Configuration for MCP Clients
Add to your MCP client configuration (e.g., Claude Desktop's claude_desktop_config.json):
Usage
The server provides four MCP tools for memory operations:
1. Memorize Content
Ingest files or directories into a named memory library:
Returns a session_id for tracking the async operation.
2. Check Memorization Status
Poll the progress of a memorization task:
Returns status (IN_PROGRESS, COMPLETED, FAILED) with progress details.
3. Recall Memories
Search for relevant memories using semantic similarity:
Returns ranked memories with similarity scores and importance metrics.
4. List Memory Libraries
Enumerate all available memory libraries:
Architecture
┌─────────────────────────────────────────────────────────┐
│ MCP Tools Layer │
│ (memorize, recall, check_status, list_libraries) │
└─────────────────────┬───────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────┐
│ Memory Coordinator Pool │
│ (Per-library coordinator management) │
└─────────────────────┬───────────────────────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
┌───────▼─────┐ ┌────▼────┐ ┌──────▼──────┐
│ Graph DB │ │ Vector │ │ Cognitive │
│ (SurrealDB) │ │ Storage │ │ Workers │
└─────────────┘ └─────────┘ └─────────────┘
Key Components
- Memory Coordinator - Orchestrates operations across graph DB, vector store, and cognitive workers
- Quantum Routing - Uses quantum-inspired algorithms for intelligent memory importance scoring
- Committee Evaluation - Multiple evaluators vote on memory relevance and importance
- Background Workers - Async processing for embeddings, indexing, and memory decay
- Transaction Manager - ACID guarantees for memory operations
Development
Building with Features
# Full cognitive capabilities
# Specific vector backends
# API server (HTTP endpoint for memory operations)
# Development mode (debug + desktop features)
Running Tests
# Run all tests
# Run specific test module
# Run with output
# Run a single test
Running the Example
# Run the demo that exercises all tools
Performance
The system is optimized for production use:
- Zero-allocation patterns with
arrayvecandsmallvecfor hot paths - SIMD optimizations via
kodegen_simdfor vector operations - Lazy loading of embedding models and coordinators
- Connection pooling for efficient database access
- Async architecture throughout for maximum concurrency
Typical performance on Apple M1 Pro:
- Embedding generation: ~500 tokens/sec (Stella 400M)
- Memory ingestion: ~1000 files/min (with chunking and indexing)
- Semantic search: <10ms for top-5 retrieval (1M+ memories)
Configuration
Memory system behavior can be configured via environment variables or the MemoryConfig struct:
use MemoryConfig;
let config = MemoryConfig ;
Embedding Models
The system uses the Stella embedding model family by default:
- stella_en_400M_v5 - 400M parameter English model (default)
- High quality semantic representations optimized for code and text
Models are automatically downloaded from HuggingFace Hub on first use.
Contributing
Contributions are welcome! Please see our contributing guidelines.
License
This project is dual-licensed under:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
You may choose either license for your purposes.
Links
- Homepage: https://kodegen.ai
- Repository: https://github.com/cyrup-ai/kodegen-candle-agent
- MCP Protocol: https://modelcontextprotocol.io
Built with ❤️ by the KODEGEN.ᴀɪ team