ruvector-attention
Advanced attention mechanisms for vector search and geometric AI.
Features
- ๐ High-Performance: SIMD-accelerated attention computations
- ๐ฏ Ergonomic API: Fluent builder pattern and preset configurations
- ๐ฆ Modular Design: Mix and match attention mechanisms
- ๐ง Flexible: Support for standard, sparse, graph, and geometric attention
- ๐ง Advanced: MoE routing, hyperbolic attention, and more
Supported Attention Mechanisms
Standard Attention
- Scaled Dot-Product:
softmax(QK^T / โd)V - Multi-Head: Parallel attention heads with diverse representations
Sparse Attention (Memory Efficient)
- Flash Attention: O(n) memory complexity with tiled computation
- Linear Attention: O(n) complexity using kernel approximation
- Local-Global: Sliding window + global tokens (Longformer-style)
Geometric Attention
- Hyperbolic Attention: Attention in hyperbolic space for hierarchical data
- Mixed Curvature: Dynamic curvature for complex geometries
Graph Attention
- Edge-Featured GAT: Graph attention with edge features
- RoPE: Rotary Position Embeddings for graphs
Mixture-of-Experts
- MoE Attention: Learned routing to specialized expert modules
- Top-k Routing: Efficient expert selection
Quick Start
use *;
// Simple multi-head attention
let attention = multi_head
.dropout
.causal
.build?;
// Use preset configurations
let bert = Bert.builder.build?;
let gpt = Gpt.builder.build?;
// Build pipelines with normalization
let pipeline = new
.add_attention
.add_norm
.add_residual;
// Compute attention
let query = vec!;
let keys = vec!;
let values = vec!;
let output = pipeline.run?;
Installation
Add to your Cargo.toml:
[]
= "0.1"
Or with specific features:
[]
= { = "0.1", = ["simd", "wasm"] }
SDK Overview
Builder API
The builder provides a fluent interface for configuring attention:
use *;
// Flash attention for long sequences
let flash = flash // dim, block_size
.causal
.dropout
.build?;
// Linear attention for O(n) complexity
let linear = linear // dim, num_features
.build?;
// MoE attention with 8 experts
let moe = moe // dim, num_experts, top_k
.expert_capacity
.jitter_noise
.build?;
// Hyperbolic attention for hierarchies
let hyperbolic = hyperbolic // dim, curvature
.build?;
Pipeline API
Compose attention with pre/post processing:
use *;
let attention = multi_head.build?;
let pipeline = new
.add_norm // Pre-normalization
.add_attention // Attention layer
.add_dropout // Dropout
.add_residual // Residual connection
.add_norm; // Post-normalization
let output = pipeline.run?;
Preset Configurations
Pre-configured attention for popular models:
use *;
// Model-specific presets
let bert = Bert.builder.build?;
let gpt = Gpt.builder.build?;
let longformer = Longformer.builder.build?;
let flash = FlashOptimized.builder.build?;
let t5 = T5.builder.build?;
let vit = ViT.builder.build?;
// Smart selection based on use case
let attention = for_sequences.build?; // Auto-select by length
let graph_attn = for_graphs.build?; // Graph attention
let fast_attn = for_large_scale.build?; // Flash attention
// By model name
let bert = from_model_name?;
let gpt2 = from_model_name?;
Architecture
ruvector-attention/
โโโ src/
โ โโโ lib.rs # Main crate entry
โ โโโ error.rs # Error types
โ โโโ traits.rs # Core attention traits
โ โโโ attention/ # Standard attention
โ โ โโโ scaled_dot_product.rs
โ โ โโโ multi_head.rs
โ โโโ sparse/ # Sparse attention
โ โ โโโ flash.rs
โ โ โโโ linear.rs
โ โ โโโ local_global.rs
โ โโโ graph/ # Graph attention
โ โ โโโ edge_featured.rs
โ โ โโโ rope.rs
โ โโโ hyperbolic/ # Geometric attention
โ โ โโโ hyperbolic_attention.rs
โ โ โโโ poincare.rs
โ โโโ moe/ # Mixture-of-Experts
โ โ โโโ expert.rs
โ โ โโโ router.rs
โ โ โโโ moe_attention.rs
โ โโโ training/ # Training utilities
โ โ โโโ loss.rs
โ โ โโโ optimizer.rs
โ โ โโโ curriculum.rs
โ โโโ sdk/ # High-level SDK
โ โโโ builder.rs # Fluent builder API
โ โโโ pipeline.rs # Composable pipelines
โ โโโ presets.rs # Model presets
Examples
Transformer Block
use *;
Long Context Processing
use *;
Graph Neural Network
use *;
Performance
Complexity Comparison
| Mechanism | Time | Memory | Use Case |
|---|---|---|---|
| Scaled Dot-Product | O(nยฒ) | O(nยฒ) | Short sequences |
| Multi-Head | O(nยฒ) | O(nยฒ) | Standard transformers |
| Flash Attention | O(nยฒ) | O(n) | Long sequences |
| Linear Attention | O(n) | O(n) | Very long sequences |
| Local-Global | O(nยทw) | O(nยทw) | Document processing |
| Hyperbolic | O(nยฒ) | O(nยฒ) | Hierarchical data |
| MoE | O(nยฒ/E) | O(nยฒ) | Specialized tasks |
Where:
n= sequence lengthw= local window sizeE= number of experts
Benchmarks
On a typical workload (batch_size=32, seq_len=512, dim=768):
- Flash Attention: 2.3x faster, 5x less memory than standard
- Linear Attention: O(n) scaling for sequences >4096
- Local-Global: 60% of standard attention cost for w=256
Features
simd- SIMD acceleration (default, enabled)wasm- WebAssembly supportnapi- Node.js bindings
Documentation
- SDK Guide - Comprehensive SDK usage guide
- API Documentation - Full API reference
- Examples - Working code examples
Contributing
Contributions are welcome! Please see CONTRIBUTING.md.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.
Citation
If you use this crate in your research, please cite:
Related Projects
- ruvector - Core vector search engine
- ruvector-graph - Graph neural networks
- ruvector-gnn - Geometric neural networks