kodegen_simd
Ultra-high-performance SIMD-accelerated operations for AI/ML workloads in Rust. Part of the KODEGEN.ᴀɪ ecosystem.
Features
-
🚀 Automatic SIMD Optimization: Runtime CPU feature detection with zero-overhead dispatch
- x86_64: AVX-512, AVX2, SSE4.1 support
- ARM64: NEON support
- Automatic fallback to optimized scalar implementations
-
🎯 Vector Similarity Operations: High-performance cosine similarity with intelligent implementation selection
-
🔥 Logits Processing: Complete pipeline for LLM inference
- Temperature scaling with SIMD acceleration
- Top-k and nucleus (top-p) sampling
- Repetition, frequency, and presence penalties
- Numerically stable softmax and argmax operations
-
📐 Structured Generation: Type-safe constrained output
- JSON syntax validation
- JSON schema-based constraints
- Generate from Rust types with
#[derive(JsonSchema)] - Predefined constraint presets
-
⚡ Zero-Allocation Hot Paths: Stack-based buffers for inference-critical operations
-
🔧 Hardware Acceleration: Optional backends for neural network operations
- CUDA / cuDNN support
- Metal (Apple Silicon) support
- Intel MKL and Apple Accelerate framework support
Installation
Add to your Cargo.toml:
[]
= "0.1"
Optional Hardware Acceleration
For CUDA support:
[]
= { = "0.1", = ["cuda"] }
For Metal (Apple Silicon):
[]
= { = "0.1", = ["metal"] }
For Intel MKL:
[]
= { = "0.1", = ["mkl"] }
For Apple Accelerate:
[]
= { = "0.1", = ["accelerate"] }
Quick Start
Vector Similarity
use cosine_similarity;
let embedding_a = vec!;
let embedding_b = vec!;
// Automatically uses best available SIMD implementation
let similarity = cosine_similarity;
println!;
Logits Processing
use ;
let mut logits = vec!;
// Apply temperature scaling
scale_temperature?;
// Compute softmax probabilities
let probabilities = softmax?;
// Find most likely token
let best_token = argmax?;
Processing Context for Generation
use ProcessingContext;
use process_logits_scalar;
use ProcessorConfig;
let mut context = new
.with_temperature
.with_top_k
.with_top_p;
let mut logits = vec!;
let config = default;
// Apply temperature, top-k, and top-p filtering
process_logits_scalar?;
// Track token history
context.extend_history;
Structured Generation with Type Safety
use constraint_for_type;
use ;
use JsonSchema;
// Create constraint from Rust type
let constraint = ?;
// Use in ProcessingContext
let mut context = new
.with_schema_constraint;
// During generation, check if tokens are valid
for token in candidate_tokens
JSON Schema Constraints
use constraint_for_schema;
let schema = r#"{
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer", "minimum": 0 },
"tags": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["name", "age"]
}"#;
let constraint = constraint_for_schema?;
Predefined Constraint Presets
use presets;
// Array of strings
let constraint = array_of_strings?;
// Array of integers
let constraint = array_of_integers?;
// Generic object with string keys
let constraint = object_with_string_keys?;
Performance
The library uses runtime CPU feature detection to automatically select the fastest implementation:
- Cosine Similarity: Up to 8x faster than naive implementations on AVX2
- Temperature Scaling: SIMD vectorization across entire logits tensor
- Softmax: Numerically stable with minimal allocation overhead
- Constraint Validation: Efficient state machine with token lookahead
Run benchmarks:
Architecture
SIMD Abstraction
Operations are implemented multiple times for different CPU features:
- AVX-512 (16-wide vectors)
- AVX2 (8-wide vectors)
- SSE4.1 (4-wide vectors)
- NEON (4-wide vectors, ARM)
- Scalar fallback
The best implementation is selected once at startup and cached.
Constraint System
The constrained generation system uses state machines to track valid token transitions:
- JSON Constraints: Validates JSON syntax during generation
- Schema Constraints: Enforces JSON schema structure (types, required fields, ranges)
- Type Constraints: Generated from Rust
serdetypes withJsonSchemaderive
Constraints can force deterministic token sequences when only one valid path exists, improving generation efficiency.
Requirements
- Rust: Nightly toolchain (specified in
rust-toolchain.toml) - Edition: 2024
Documentation
- API Documentation
- CLAUDE.md - Architecture and development guide
- Examples - Additional usage examples
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Acknowledgments
Part of the KODEGEN.ᴀɪ ecosystem for AI-powered database tools and MCP servers.