kizzasi-inference
Unified autoregressive inference engine for Kizzasi AGSP.
Overview
Production-grade inference pipeline with sampling strategies, batching, streaming, and constraint enforcement. Supports all Kizzasi model architectures.
Features
- Sampling Strategies: Greedy, temperature, top-k, top-p, beam search
- Batching: Dynamic batching with continuous processing
- Streaming: Async streaming with backpressure handling
- Constraints: Integration with kizzasi-logic for safety guardrails
- Multi-Modal: Audio, video, sensor fusion pipelines
- Checkpointing: Save/load inference state and configuration
- Network Adapters: WebSocket, MQTT, gRPC for real-time inference
- Hot-Swapping: Runtime model switching without interruption
Quick Start
use ;
// Create inference pipeline
let pipeline = builder
.model
.tokenizer
.sampler
.build?;
// Single-step prediction
let input = zeros;
let output = pipeline.predict?;
// Multi-step rollout
let predictions = pipeline.predict_n?;
// Streaming inference
let mut stream = pipeline.stream.await?;
while let Some = stream.next.await
Performance
- Single-step latency: <100μs (Mamba2)
- Throughput: 320K predictions/sec (16 workers)
- 177 comprehensive tests, all passing
Documentation
License
Licensed under either of Apache License, Version 2.0 or MIT license at your option.