OxiRAG
A four-layer Retrieval-Augmented Generation (RAG) engine in Pure Rust with SMT-based logic verification and knowledge graph support.
Key Innovations:
- Speculative RAG: Use cache as "drafts" not "answers" - verify with SLM before finalizing
- Context-Aware Prefix Caching: Efficiently manage KV Cache for "premise knowledge"
- On-the-fly Distillation: Automatically generate specialized lightweight models for frequent queries
- Hidden States Manipulation: Direct manipulation of transformer hidden states for verification
Overview
OxiRAG provides a robust RAG pipeline with four specialized layers:
Query
│
├───────────────────────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Layer 1: Echo │ │ Layer 4: Graph │
│ (Vector Store) │ │ (GraphRAG) │
└────────┬────────┘ └────────┬────────┘
│ │
└───────────┬─────────────┘
▼
┌─────────────────────┐
│ Layer 2: Speculator │ ← Draft verification with SLM
│ (Draft Checker) │
└─────────┬───────────┘
│
▼
┌─────────────────┐
│ Layer 3: Judge │ ← Logic verification with SMT solver
│ (SMT Verifier) │
└────────┬────────┘
│
▼
Verified Answer
Features
-
Layer 1 (Echo): Semantic search using vector embeddings
- Cosine, Euclidean, and Dot Product similarity metrics
- In-memory vector store with configurable capacity
- Pluggable embedding providers (Candle BERT, mock for testing)
-
Layer 2 (Speculator): Draft verification
- Rule-based speculator for quick verification
- Candle-based SLM for advanced verification (optional)
- Accept/Revise/Reject decision pipeline
-
Layer 3 (Judge): SMT-based logic verification
- Claim extraction from natural language
- SMT-LIB encoding of logical claims
- OxiZ SMT solver integration for formal verification
- Temporal, causal, and modal claim structures
-
Layer 4 (Graph): Knowledge graph-based retrieval (GraphRAG)
- Entity and relationship extraction from documents
- In-memory graph store with traversal algorithms
- BFS traversal, shortest path, and N-hop queries
- Hybrid search combining vector and graph results
-
Prefix Caching: Context-aware KV cache management
- Efficient caching of processed document contexts
- Context fingerprinting for cache key generation
- LRU eviction with TTL support
- Prefix matching for partial cache hits
-
Distillation: On-the-fly model distillation support
- Query pattern frequency tracking
- Automatic Q&A pair collection
- Distillation candidate detection
- Training example export for LoRA fine-tuning
- Feature-based distillation (FitNet, attention transfer)
- Teacher-student architecture with progressive learning
- Distillation loss functions (KL, MSE, cosine)
-
Cross-Platform: Native and WASM support
- Full async/await support with Tokio (native)
- WASM bindings for browser/edge deployment
Installation
Add to your Cargo.toml:
[]
= "0.1"
Feature Flags
| Feature | Description | Default |
|---|---|---|
native |
Enable native runtime with Tokio | Yes |
wasm |
Enable WASM bindings | No |
echo |
Enable Layer 1 (semantic search) | Yes |
speculator |
Enable Layer 2 with Candle SLM | No |
judge |
Enable Layer 3 with OxiZ SMT solver | No |
graphrag |
Enable Layer 4 (knowledge graph) | No |
prefix-cache |
Enable prefix caching for KV cache management | No |
distillation |
Enable distillation tracking and Q&A collection | No |
full |
Enable all features (native only) | No |
cuda |
Enable CUDA acceleration for Candle | No |
metal |
Enable Metal acceleration for Candle | No |
Quick Start
use *;
async
Architecture
Layer 1: Echo
The Echo layer handles semantic search:
// Configure embedding provider
let provider = new;
// Configure vector store
let store = new
.with_metric
.with_max_capacity;
// Create Echo layer
let echo = new;
Layer 2: Speculator
The Speculator layer verifies draft answers:
// Rule-based (default)
let speculator = default;
// Or with custom config
let config = SpeculatorConfig ;
let speculator = new;
Layer 3: Judge
The Judge layer performs formal verification:
let judge = new;
Layer 4: Graph (GraphRAG)
The Graph layer enables knowledge graph-based retrieval:
use *;
// Create extractors
let entity_extractor = new;
let relationship_extractor = new;
// Create graph store
let graph_store = new;
// Build the GraphLayer
let mut graph_layer = new
.with_entity_extractor
.with_relationship_extractor
.with_graph_store
.build?;
// Index documents (extracts entities and relationships)
graph_layer.index_document.await?;
// Query the graph
let query = new
.with_max_hops;
let paths = graph_layer.query.await?;
// Find related entities
let related = graph_layer.find_related.await?;
Prefix Caching
The Prefix Cache module enables efficient caching of processed document contexts:
use *;
// Create a prefix cache
let config = default
.with_max_entries
.with_default_ttl_secs;
let mut cache = new;
// Generate a fingerprint for context
let generator = new;
let fingerprint = generator.generate;
// Create and store a KV cache entry
let entry = new;
cache.put.await?;
// Retrieve cached entry
if let Some = cache.get.await
// Check cache statistics
let stats = cache.stats;
println!;
Distillation
The Distillation module tracks query patterns for model fine-tuning:
use *;
// Create a distillation tracker
let config = default
.with_min_frequency_threshold
.with_max_qa_pairs_per_pattern;
let mut tracker = new;
// Track queries with their answers
tracker.track_query.await?;
tracker.track_query.await?;
// Get candidates ready for distillation
let candidates = tracker.get_candidates.await;
for candidate in candidates
// Export training examples for LoRA fine-tuning
let examples = tracker.export_training_examples.await;
WASM Usage
Build for WASM:
Use in JavaScript:
import init from './pkg/oxirag.js';
await ;
const engine = ;
await engine.;
const results = await engine.;
Configuration
Pipeline Configuration
let config = PipelineConfig ;
Loading from JSON
let config = from_json?;
Testing
# Run all tests (1,500 tests)
# Run clippy (no warnings)
# Run rustdoc validation (no warnings)
RUSTDOCFLAGS="-D warnings"
# Run doc tests
Benchmarks
Project Statistics
| Metric | Value |
|---|---|
| Source Files | 91 Rust files |
| Lines of Code | 48,463 |
| Tests | 1,500 |
| Clippy Warnings | 0 |
| Rustdoc Warnings | 0 |
License
Licensed under the Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0).
Roadmap
See TODO.md for detailed progress on each layer:
| Feature | Progress |
|---|---|
| Speculative RAG | 99% |
| Context-Aware Prefix Caching | 95% |
| On-the-fly Distillation | 85% |
| Hidden States Manipulation | 90% |
COOLJAPAN Ecosystem
OxiRAG is part of the COOLJAPAN Pure Rust ecosystem:
- OxiZ: SMT solver for logic verification
- SciRS2: Scientific computing library
- NumRS2: Numerical computing primitives
- OxiBLAS: BLAS implementation in Pure Rust
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
COOLJAPAN OU (Team Kitasan)