KODEGEN.ᴀɪ Candle Agent

Memory-efficient, blazing-fast MCP tools for code generation agents.

A high-performance Model Context Protocol (MCP) server that provides cognitive memory capabilities for AI agents. Built with Rust and the Candle ML framework, it delivers semantic memory storage, retrieval, and quantum-inspired routing for intelligent code generation workflows.

Features

🧠 Cognitive Memory System - Store and retrieve code context with semantic understanding
⚡ High-Performance - Rust + SIMD optimizations for blazing-fast embeddings and retrieval
🔄 Async Operations - Non-blocking memory ingestion with progress tracking
🎯 Multiple Retrieval Strategies - Semantic, temporal, and hybrid search modes
🌊 Quantum-Inspired Routing - Advanced memory importance scoring with entanglement
📊 Vector Storage - Support for FAISS, HNSW, and instant-distance backends
💾 Persistent Storage - SurrealDB with embedded SurrealKV for ACID transactions
🚀 Hardware Acceleration - CUDA, Metal, MKL, and Accelerate support
🔧 MCP Compatible - Works with Claude Desktop, Cline, and other MCP clients

Quick Start

Prerequisites

Rust nightly toolchain (automatically configured via rust-toolchain.toml)
For GPU acceleration:
- CUDA 12+ (NVIDIA GPUs)
- Metal (Apple Silicon)
- MKL (Intel CPUs)

Installation

# Clone the repository
git clone https://github.com/cyrup-ai/kodegen-candle-agent.git
cd kodegen-candle-agent

# Build with default features
cargo build --release

# Or build with hardware acceleration
cargo build --release --features metal  # macOS
cargo build --release --features cuda   # NVIDIA GPU

Running the Server

# Start the MCP server (HTTP transport)
cargo run --release

# The server will start on http://localhost:3000 by default

Configuration for MCP Clients

Add to your MCP client configuration (e.g., Claude Desktop's claude_desktop_config.json):

{
  "mcpServers": {
    "kodegen-candle-agent": {
      "command": "cargo",
      "args": ["run", "--release"],
      "cwd": "/path/to/kodegen-candle-agent"
    }
  }
}

Usage

The server provides four MCP tools for memory operations:

1. Memorize Content

Ingest files or directories into a named memory library:

{
  "tool": "memory_memorize",
  "arguments": {
    "input": "/path/to/your/codebase",
    "library": "my-project"
  }
}

Returns a session_id for tracking the async operation.

2. Check Memorization Status

Poll the progress of a memorization task:

{
  "tool": "memory_check_memorize_status",
  "arguments": {
    "session_id": "your-session-id"
  }
}

Returns status (IN_PROGRESS, COMPLETED, FAILED) with progress details.

3. Recall Memories

Search for relevant memories using semantic similarity:

{
  "tool": "memory_recall",
  "arguments": {
    "query": "authentication logic",
    "library": "my-project",
    "top_k": 5
  }
}

Returns ranked memories with similarity scores and importance metrics.

4. List Memory Libraries

Enumerate all available memory libraries:

{
  "tool": "memory_list_libraries",
  "arguments": {}
}

Architecture

┌─────────────────────────────────────────────────────────┐
│                    MCP Tools Layer                      │
│  (memorize, recall, check_status, list_libraries)      │
└─────────────────────┬───────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────┐
│              Memory Coordinator Pool                    │
│        (Per-library coordinator management)             │
└─────────────────────┬───────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
┌───────▼─────┐ ┌────▼────┐ ┌──────▼──────┐
│  Graph DB   │ │ Vector  │ │  Cognitive  │
│ (SurrealDB) │ │ Storage │ │   Workers   │
└─────────────┘ └─────────┘ └─────────────┘

Key Components

Memory Coordinator - Orchestrates operations across graph DB, vector store, and cognitive workers
Quantum Routing - Uses quantum-inspired algorithms for intelligent memory importance scoring
Committee Evaluation - Multiple evaluators vote on memory relevance and importance
Background Workers - Async processing for embeddings, indexing, and memory decay
Transaction Manager - ACID guarantees for memory operations

Development

Building with Features

# Full cognitive capabilities
cargo build --features full-cognitive

# Specific vector backends
cargo build --features faiss-vector
cargo build --features hnsw-vector

# API server (HTTP endpoint for memory operations)
cargo build --features api

# Development mode (debug + desktop features)
cargo build --features dev

Running Tests

# Run all tests
cargo test

# Run specific test module
cargo test --test memory

# Run with output
cargo test -- --nocapture

# Run a single test
cargo test test_quantum_mcts

Running the Example

# Run the demo that exercises all tools
cargo run --example candle_agent_demo --release

Performance

The system is optimized for production use:

Zero-allocation patterns with arrayvec and smallvec for hot paths
SIMD optimizations via kodegen_simd for vector operations
Lazy loading of embedding models and coordinators
Connection pooling for efficient database access
Async architecture throughout for maximum concurrency

Typical performance on Apple M1 Pro:

Embedding generation: ~500 tokens/sec (Stella 400M)
Memory ingestion: ~1000 files/min (with chunking and indexing)
Semantic search: <10ms for top-5 retrieval (1M+ memories)

Configuration

Memory system behavior can be configured via environment variables or the MemoryConfig struct:

use kodegen_candle_agent::memory::utils::config::MemoryConfig;

let config = MemoryConfig {
    database: DatabaseConfig {
        connection_string: "surrealkv://memory.db".to_string(),
        namespace: "kodegen".to_string(),
        database: "memories".to_string(),
        username: None,
        password: None,
    },
    vector: VectorConfig {
        backend: VectorBackend::InstantDistance,
        dimension: 1024,
    },
    cognitive: CognitiveConfig::default(),
};

Embedding Models

The system uses the Stella embedding model family by default:

stella_en_400M_v5 - 400M parameter English model (default)
High quality semantic representations optimized for code and text

Models are automatically downloaded from HuggingFace Hub on first use.

Contributing

Contributions are welcome! Please see our contributing guidelines.

License

This project is dual-licensed under:

Apache License, Version 2.0 (LICENSE-APACHE)
MIT License (LICENSE-MIT)

You may choose either license for your purposes.

kodegen_candle_agent 0.10.11