Ruvector Tiny Dancer Core
Production-grade AI agent routing system with FastGRNN neural inference for 70-85% LLM cost reduction.
🚀 Introduction
The Problem: AI applications often send every request to expensive, powerful models, even when simpler models could handle the task. This wastes money and resources.
The Solution: Tiny Dancer acts as a smart traffic controller for your AI requests. It quickly analyzes each request and decides whether to route it to a fast, cheap model or a powerful, expensive one.
How It Works:
- You send a request with potential responses (candidates)
- Tiny Dancer scores each candidate in microseconds
- High-confidence candidates go to lightweight models (fast & cheap)
- Low-confidence candidates go to powerful models (accurate but expensive)
The Result: Save 70-85% on AI costs while maintaining quality.
Real-World Example: Instead of sending 100 memory items to GPT-4 for evaluation, Tiny Dancer filters them down to the top 3-5 in microseconds, then sends only those to the expensive model.
✨ Features
- ⚡ Sub-millisecond Latency: 144ns feature extraction, 7.5µs model inference
- 💰 70-85% Cost Reduction: Intelligent routing to appropriately-sized models
- 🧠 FastGRNN Architecture: <1MB models with 80-90% sparsity
- 🔒 Circuit Breaker: Graceful degradation with automatic recovery
- 📊 Uncertainty Quantification: Conformal prediction for reliable routing
- 🗄️ AgentDB Integration: Persistent SQLite storage with WAL mode
- 🎯 Multi-Signal Scoring: Semantic similarity, recency, frequency, success rate
- 🔧 Model Optimization: INT8 quantization, magnitude pruning
📊 Benchmark Results
Feature Extraction:
10 candidates: 1.73µs (173ns per candidate)
50 candidates: 9.44µs (189ns per candidate)
100 candidates: 18.48µs (185ns per candidate)
Model Inference:
Single: 7.50µs
Batch 10: 74.94µs (7.49µs per item)
Batch 100: 735.45µs (7.35µs per item)
Complete Routing:
10 candidates: 8.83µs
50 candidates: 48.23µs
100 candidates: 92.86µs
🚀 Quick Start
Installation
Add to your Cargo.toml:
[]
= "0.1.1"
Basic Usage
use ;
use HashMap;
// Create router
let config = RouterConfig ;
let router = new?;
// Prepare candidates
let candidates = vec!;
// Route request
let request = RoutingRequest ;
let response = router.route?;
// Process decisions
for decision in response.decisions
📚 Tutorials
Tutorial 1: Basic Routing
use ;
Tutorial 2: Feature Engineering
use ;
Tutorial 3: Circuit Breaker
use Router;
Tutorial 4: Model Optimization
use ;
Tutorial 5: SQLite Storage
use Storage;
🎯 Advanced Usage
Hot Model Reloading
// Reload model without downtime
router.reload_model?;
Custom Configuration
let config = RouterConfig ;
Batch Processing
let inputs = vec!;
let scores = model.forward_batch?;
// Process 3 inputs in ~22µs total
📈 Performance Optimization
SIMD Acceleration
Feature extraction uses simsimd for hardware-accelerated similarity:
- Cosine similarity: 144ns (384-dim vectors)
- Batch processing: Linear scaling with candidate count
Zero-Copy Operations
- Memory-mapped models with
memmap2 - Zero-allocation inference paths
- Efficient buffer reuse
Parallel Processing
- Rayon-based parallel feature extraction
- Batch inference for multiple candidates
- Concurrent storage operations with WAL
🔧 Configuration
| Parameter | Default | Description |
|---|---|---|
confidence_threshold |
0.85 | Minimum confidence for lightweight routing |
max_uncertainty |
0.15 | Maximum uncertainty tolerance |
circuit_breaker_threshold |
5 | Failures before circuit opens |
recency_decay |
0.001 | Exponential decay rate for recency |
📊 Cost Analysis
For 10,000 daily queries at $0.02 per query:
| Scenario | Reduction | Daily Savings | Annual Savings |
|---|---|---|---|
| Conservative | 70% | $132 | $48,240 |
| Aggressive | 85% | $164 | $59,876 |
Break-even: ~2 months with typical engineering costs
🔗 Related Projects
- WASM: ruvector-tiny-dancer-wasm - Browser/edge deployment
- Node.js: ruvector-tiny-dancer-node - TypeScript bindings
- Ruvector: ruvector-core - Vector database
📚 Resources
- Documentation: docs.rs/ruvector-tiny-dancer-core
- GitHub: github.com/ruvnet/ruvector
- Website: ruv.io
- Examples: github.com/ruvnet/ruvector/tree/main/examples
🤝 Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
📄 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
- FastGRNN architecture inspired by Microsoft Research
- RouteLLM for routing methodology
- Cloudflare Workers for WASM deployment patterns
Built with ❤️ by the Ruvector Team