Expand description
§kizzasi-inference
Unified autoregressive inference engine for Kizzasi AGSP.
This crate provides the core inference loop that combines:
- Signal tokenization (via kizzasi-tokenizer)
- Model forward pass (via kizzasi-model)
- Constraint enforcement (via kizzasi-logic)
§The Inference Pipeline
Raw Signal → Tokenize → Model → Constrain → Decode → Output
↑ ↓
└──── Hidden State ───┘§Autoregressive Prediction
As described in a.md, the AGSP predicts “the next value” based on history, similar to how LLMs predict “the next token”. This crate implements:
- Single-step prediction:
step(input) -> output - Multi-step rollout:
rollout(input, steps) -> [outputs] - Streaming inference: Real-time continuous prediction
§COOLJAPAN Ecosystem
This crate follows KIZZASI_POLICY.md and coordinates between all kizzasi-* crates.
Re-exports§
pub use temporal::LTLFormula;pub use temporal::STLFormula;pub use temporal::TemporalBound;pub use temporal::TemporalConstraintEnforcer;pub use versioning::FallbackStrategy as VersionFallbackStrategy;pub use versioning::HealthCheck;pub use versioning::HealthStatus;pub use versioning::ModelMetadata;pub use versioning::ModelStats as VersionModelStats;pub use versioning::ModelVersion;pub use versioning::ModelVersionManager;pub use versioning::VersioningConfig;
Modules§
- temporal
- Temporal logic constraints for inference
- versioning
- Model versioning and fallback management
Structs§
- Adaptive
Rejection Sampler - Adaptive rejection sampler that learns from rejections
- Batch
Config - Configuration for continuous batching
- Batch
Request - A single inference request in the batch
- Batch
Response - Response from a batch inference request
- Batch
Scheduler - Continuous batching scheduler
- Beam
- Beam search state for multi-step prediction
- Beam
Search - Beam search manager
- Buffer
Key - A key identifying a buffer shape and type configuration.
- Checkpoint
- Checkpoint containing full inference state
- Checkpoint
Manager - Manager for checkpoint snapshots (for rollback)
- Checkpoint
Metadata - Metadata for checkpoints
- Compressed
State - Compressed representation of a hidden state
- Constrained
Beam Search - Constrained beam search that only keeps beams satisfying constraints
- Context
Config - Configuration for inference context
- Engine
Config - Configuration for the inference engine
- Ensemble
Builder - Builder for creating model ensembles
- Ensemble
Config - Configuration for model ensemble
- Inference
Context - Manages inference context including history and hidden states
- Inference
Engine - The main inference engine for AGSP
- Inference
Metrics - Performance metrics for inference operations
- Inference
Profiler - Profiler for tracking different stages of inference
- Lora
Adapter - A LoRA adapter consisting of two low-rank matrices
- Lora
Adapter Builder - Builder for creating LoRA adapters from components
- Lora
Adapter Loader - LoRA adapter loader for reading from disk
- Lora
Adapter Manager - Manager for multiple LoRA adapters
- Lora
Config - LoRA adapter configuration
- Metrics
Summary - Summary of performance metrics
- Modality
Config - Configuration for a single modality
- Model
Builder - Builder for common model configurations
- Model
Config - Configuration for model loading
- Model
Ensemble - Model ensemble for combining multiple models
- Model
Info - Information about the loaded model
- Model
Registry - Model registry for creating and managing model instances
- Multi
Modal Pipeline - Multi-modal inference pipeline
- Multi
Modal Pipeline Builder - Builder for multi-modal pipelines
- Pipeline
- A complete inference pipeline
- Pipeline
Builder - Builder for constructing inference pipelines
- Pool
Stats - Pooled
Buffer - A pooled buffer that returns itself to the pool when dropped.
- Precision
Config - Configuration for mixed precision inference
- Precision
Converter - Precision converter for array operations
- Precision
Stats - Statistics about precision conversion
- Profile
Breakdown - Breakdown of time spent in different stages
- Rejection
Sampler - Rejection sampler that rejects samples violating constraints
- Sampler
- Sampler for generating predictions from model outputs
- Sampling
Config - Configuration for sampling strategies
- Scheduler
Stats - Statistics about the batch scheduler
- Speculative
Config - Configuration for speculative decoding
- Speculative
Decoder - Speculative decoding engine
- State
Compressor - State compressor with configurable compression method
- Tensor
Pool - A thread-safe memory pool for tensor buffers.
- Timer
- Timer for measuring operation duration
Enums§
- Compression
Method - Compression method for hidden states
- Compute
Precision - Compute precision for mixed mode
- Ensemble
Strategy - Ensemble combination strategy
- Fallback
Strategy - Fallback strategy when rejection sampling fails
- Fusion
Strategy - Fusion strategy for combining multiple modalities
- Inference
Error - Errors that can occur during inference
- Inference
Mode - Memory-efficient inference modes
- Modality
Type - Supported modality types
- Model
Type - Enumeration of supported model architectures
- Precision
Mode - Precision mode for inference
- Priority
- Priority level for inference requests
- Sampling
Strategy - Available sampling strategies
Traits§
- Autoregressive
Model - Trait for model architectures that support autoregressive prediction
- Signal
Predictor - Core trait for autoregressive signal prediction
- Signal
Tokenizer - Trait for signal tokenization
Type Aliases§
- Array1
- one-dimensional array
- Array2
- two-dimensional array
- Constraint
Fn - Constraint function type for constrained beam search Returns true if the sequence satisfies the constraint
- Custom
Sampling Fn - Custom sampling function type Takes logits and temperature, returns sampled index
- Inference
Result - Result type alias for inference operations
- Modality
Preprocessor - Preprocessor for a specific modality
- Postprocess
Hook - Postprocessing hook that transforms output after model forward pass
- Preprocess
Hook - Preprocessing hook that transforms input before model forward pass