Crate kizzasi_inference

Crate kizzasi_inference 

Source
Expand description

§kizzasi-inference

Unified autoregressive inference engine for Kizzasi AGSP.

This crate provides the core inference loop that combines:

  • Signal tokenization (via kizzasi-tokenizer)
  • Model forward pass (via kizzasi-model)
  • Constraint enforcement (via kizzasi-logic)

§The Inference Pipeline

Raw Signal → Tokenize → Model → Constrain → Decode → Output
    ↑                     ↓
    └──── Hidden State ───┘

§Autoregressive Prediction

As described in a.md, the AGSP predicts “the next value” based on history, similar to how LLMs predict “the next token”. This crate implements:

  • Single-step prediction: step(input) -> output
  • Multi-step rollout: rollout(input, steps) -> [outputs]
  • Streaming inference: Real-time continuous prediction

§COOLJAPAN Ecosystem

This crate follows KIZZASI_POLICY.md and coordinates between all kizzasi-* crates.

Re-exports§

pub use temporal::LTLFormula;
pub use temporal::STLFormula;
pub use temporal::TemporalBound;
pub use temporal::TemporalConstraintEnforcer;
pub use versioning::FallbackStrategy as VersionFallbackStrategy;
pub use versioning::HealthCheck;
pub use versioning::HealthStatus;
pub use versioning::ModelMetadata;
pub use versioning::ModelStats as VersionModelStats;
pub use versioning::ModelVersion;
pub use versioning::ModelVersionManager;
pub use versioning::VersioningConfig;

Modules§

temporal
Temporal logic constraints for inference
versioning
Model versioning and fallback management

Structs§

AdaptiveRejectionSampler
Adaptive rejection sampler that learns from rejections
BatchConfig
Configuration for continuous batching
BatchRequest
A single inference request in the batch
BatchResponse
Response from a batch inference request
BatchScheduler
Continuous batching scheduler
Beam
Beam search state for multi-step prediction
BeamSearch
Beam search manager
BufferKey
A key identifying a buffer shape and type configuration.
Checkpoint
Checkpoint containing full inference state
CheckpointManager
Manager for checkpoint snapshots (for rollback)
CheckpointMetadata
Metadata for checkpoints
CompressedState
Compressed representation of a hidden state
ConstrainedBeamSearch
Constrained beam search that only keeps beams satisfying constraints
ContextConfig
Configuration for inference context
EngineConfig
Configuration for the inference engine
EnsembleBuilder
Builder for creating model ensembles
EnsembleConfig
Configuration for model ensemble
InferenceContext
Manages inference context including history and hidden states
InferenceEngine
The main inference engine for AGSP
InferenceMetrics
Performance metrics for inference operations
InferenceProfiler
Profiler for tracking different stages of inference
LoraAdapter
A LoRA adapter consisting of two low-rank matrices
LoraAdapterBuilder
Builder for creating LoRA adapters from components
LoraAdapterLoader
LoRA adapter loader for reading from disk
LoraAdapterManager
Manager for multiple LoRA adapters
LoraConfig
LoRA adapter configuration
MetricsSummary
Summary of performance metrics
ModalityConfig
Configuration for a single modality
ModelBuilder
Builder for common model configurations
ModelConfig
Configuration for model loading
ModelEnsemble
Model ensemble for combining multiple models
ModelInfo
Information about the loaded model
ModelRegistry
Model registry for creating and managing model instances
MultiModalPipeline
Multi-modal inference pipeline
MultiModalPipelineBuilder
Builder for multi-modal pipelines
Pipeline
A complete inference pipeline
PipelineBuilder
Builder for constructing inference pipelines
PoolStats
PooledBuffer
A pooled buffer that returns itself to the pool when dropped.
PrecisionConfig
Configuration for mixed precision inference
PrecisionConverter
Precision converter for array operations
PrecisionStats
Statistics about precision conversion
ProfileBreakdown
Breakdown of time spent in different stages
RejectionSampler
Rejection sampler that rejects samples violating constraints
Sampler
Sampler for generating predictions from model outputs
SamplingConfig
Configuration for sampling strategies
SchedulerStats
Statistics about the batch scheduler
SpeculativeConfig
Configuration for speculative decoding
SpeculativeDecoder
Speculative decoding engine
StateCompressor
State compressor with configurable compression method
TensorPool
A thread-safe memory pool for tensor buffers.
Timer
Timer for measuring operation duration

Enums§

CompressionMethod
Compression method for hidden states
ComputePrecision
Compute precision for mixed mode
EnsembleStrategy
Ensemble combination strategy
FallbackStrategy
Fallback strategy when rejection sampling fails
FusionStrategy
Fusion strategy for combining multiple modalities
InferenceError
Errors that can occur during inference
InferenceMode
Memory-efficient inference modes
ModalityType
Supported modality types
ModelType
Enumeration of supported model architectures
PrecisionMode
Precision mode for inference
Priority
Priority level for inference requests
SamplingStrategy
Available sampling strategies

Traits§

AutoregressiveModel
Trait for model architectures that support autoregressive prediction
SignalPredictor
Core trait for autoregressive signal prediction
SignalTokenizer
Trait for signal tokenization

Type Aliases§

Array1
one-dimensional array
Array2
two-dimensional array
ConstraintFn
Constraint function type for constrained beam search Returns true if the sequence satisfies the constraint
CustomSamplingFn
Custom sampling function type Takes logits and temperature, returns sampled index
InferenceResult
Result type alias for inference operations
ModalityPreprocessor
Preprocessor for a specific modality
PostprocessHook
Postprocessing hook that transforms output after model forward pass
PreprocessHook
Preprocessing hook that transforms input before model forward pass