Skip to main content

Crate kizzasi_model

Crate kizzasi_model 

Source
Expand description

§kizzasi-model

Model architectures for Kizzasi AGSP (Autoregressive General-Purpose Signal Predictor).

This crate implements various State Space Model architectures optimized for continuous signal prediction with O(1) inference step complexity:

  • Mamba/Mamba2: Selective State Space Models with input-dependent dynamics
  • RWKV: Linear attention with time-mixing and channel-mixing
  • S4/S4D: Structured State Space Models with diagonal state matrices
  • Transformer: Standard attention for comparison (O(N) per step)

§COOLJAPAN Ecosystem

This crate follows KIZZASI_POLICY.md and uses scirs2-core for all array and numerical operations.

§Architecture Philosophy

As described in the AGSP concept, these models treat all signals (audio, video, sensors, actions) as equivalent tokenized sequences, enabling cross-modal prediction and world model construction.

Re-exports§

pub use arch_search::search_best_arch;
pub use arch_search::ArchCandidate;
pub use arch_search::ArchSearchConfig;
pub use arch_search::ArchSearchResult;
pub use arch_search::ArchSearchSpace;
pub use arch_search::EvolutionarySearcher;
pub use arch_search::GridSearcher;
pub use arch_search::RandomArchSearcher;
pub use backprop::layer_norm_backward;
pub use backprop::linear_backward;
pub use backprop::silu_backward;
pub use backprop::softmax_backward;
pub use backprop::GradAccumulator;
pub use backprop::GradientTape;
pub use backprop::SsmBackward;
pub use backprop::SsmGradients;
pub use backprop::Tensor;
pub use gguf::GgufFile;
pub use gguf::GgufInspection;
pub use gguf::GgufMetaValue;
pub use gguf::GgufQuantType;
pub use gguf::GgufTensorInfo;
pub use incremental_loader::GgufFileSource;
pub use incremental_loader::IncrementalModelLoader;
pub use incremental_loader::SafeTensorsSource;
pub use incremental_loader::WeightSource;
pub use loader::ModelLoader;
pub use loader::TensorInfo;
pub use loader::WeightLoader;
pub use lora::LoraAdapter;
pub use lora::LoraAdapterSummary;
pub use lora::LoraConfig;
pub use lora::LoraLinear;
pub use lora::QLoraLinear;
pub use multimodal::FusionStrategy;
pub use multimodal::Modality;
pub use multimodal::ModalityAligner;
pub use multimodal::MultiModalConfig;
pub use multimodal::MultiModalModel;
pub use neural_ode::AugmentedNeuralOde;
pub use neural_ode::NeuralOdeConfig;
pub use neural_ode::NeuralOdeModel;
pub use neural_ode::OdeIntegrator;
pub use neural_ode::OdeSolver;
pub use spiking::LifLayer;
pub use spiking::MembranePotential;
pub use spiking::ResetMode;
pub use spiking::SpikingConfig;
pub use spiking::SpikingNeuralNetwork;
pub use spiking::StdpConfig;
pub use temporal_multiscale::MultiScaleConfig;
pub use temporal_multiscale::MultiScaleModel;
pub use temporal_multiscale::ScaleFusion;
pub use training_loop::AdamOptimizer;
pub use training_loop::ArrayDataProvider;
pub use training_loop::ConstantScheduler;
pub use training_loop::DataProvider;
pub use training_loop::ExponentialScheduler;
pub use training_loop::LrScheduler;
pub use training_loop::Optimizer;
pub use training_loop::SgdOptimizer;
pub use training_loop::StepDecayScheduler;
pub use training_loop::TrainingCallback;
pub use training_loop::TrainingConfig;
pub use training_loop::TrainingLoop;
pub use training_loop::TrainingResult;
pub use distributed::average_gradients;
pub use distributed::partition_indices;
pub use distributed::run_parallel_workers;
pub use distributed::sgd_step;
pub use distributed::CommBackend;
pub use distributed::DataParallelModel;
pub use distributed::DistributedConfig;
pub use distributed::GradientBuffer;
pub use distributed::GradientStrategy;
pub use distributed::GradientSync;
pub use distributed::LocalGradientSync;
pub use distributed::SharedGradientStore;
pub use distributed::ThreadedGradientSync;
pub use curriculum::CurriculumDataProvider;
pub use curriculum::CurriculumScheduler;
pub use curriculum::CurriculumStrategy;
pub use gradient_checkpoint::ActivationCheckpointer;
pub use gradient_checkpoint::CheckpointConfig;
pub use speculative::SpeculativeDecoder;
pub use speculative::SpeculativeResult;
pub use early_exit::AdaptiveComputation;
pub use early_exit::EarlyExitConfig;
pub use early_exit::ExitCriterion;
pub use early_exit::ExitStats;
pub use blas_ops::axpy;
pub use blas_ops::batch_matmul_vec;
pub use blas_ops::dot;
pub use blas_ops::matmul_mat;
pub use blas_ops::matmul_vec;
pub use blas_ops::norm_frobenius;
pub use blas_ops::norm_l2;
pub use blas_ops::transpose;
pub use blas_ops::BlasConfig;
pub use profiling::BottleneckInfo;
pub use profiling::BottleneckSeverity;
pub use profiling::ComprehensiveComparison;
pub use profiling::ComprehensiveProfiler;
pub use profiling::ModelBottleneckAnalysis;
pub use interpretability::ActivationStats;
pub use interpretability::CompressionAnalysis;
pub use interpretability::GatingAnalysis;
pub use interpretability::InterpretabilityReport;
pub use interpretability::LayerProbe;
pub use interpretability::SensitivityAnalyzer;
pub use interpretability::StateTrajectory;
pub use visualization::matrix_to_csv;
pub use visualization::signal_to_svg_sparkline;
pub use visualization::ActivationHistogram;
pub use visualization::GatingPatternRecorder;
pub use visualization::PhasePortrait;
pub use compression::CompressionReport;
pub use compression::LowRankApprox;
pub use compression::MagnitudePruner;
pub use compression::StructuredPruner;
pub use state_io::decode_f32_slice;
pub use state_io::encode_f32_slice;
pub use state_io::ModelSnapshot;
pub use rwkv5::Rwkv5Config;
pub use rwkv5::Rwkv5Model;
pub use rwkv5::Rwkv5State;
pub use rwkv7::Rwkv7Config;
pub use rwkv7::Rwkv7Model;
pub use rwkv7::Rwkv7State;
pub use rwkv7::Rwkv7TimeMixing;

Modules§

arch_search
Neural Architecture Search (NAS) for SSM Hyperparameters
backprop
Backward pass and gradient computation infrastructure for kizzasi-model.
backprop_ssm
SSM-specific backward pass types and free functions.
batch
Batched Inference Support
blas_ops
BLAS-Accelerated Operations
cache_friendly
Cache-Friendly Memory Layouts
checkpoint
Checkpointing and Training Utilities
compression
Model Compression Utilities
curriculum
Curriculum Learning for kizzasi-model
distributed
Distributed Training Support for kizzasi-model
dynamic_quantization
Dynamic Quantization for On-the-Fly Model Compression
early_exit
Adaptive Early Exit for kizzasi-model
factory
Model Factory for Instantiating Models from Loaded Weights
flash_linear_attn
Flash Linear Attention — chunk-wise O(n) memory linear attention
gguf
GGUF Format Support
gradient_checkpoint
Gradient Checkpointing for Memory-Efficient Training
h3
H3: Hungry Hungry Hippos
huggingface
HuggingFace Hub Integration
huggingface_loader
HuggingFace Model Loading and Weight Conversion
hybrid
Hybrid Mamba+Attention Model
incremental_loader
Incremental / streaming weight loading for large model (7B+) support.
interpretability
Model Interpretability Tools
loader
Weight loading from safetensors format
lora
LoRA (Low-Rank Adaptation) for efficient fine-tuning
mamba
Mamba: Selective State Space Model
mamba2
Mamba2: Enhanced Selective State Space Model with State Space Duality (SSD)
mixed_precision
Mixed Precision Support (FP16/BF16)
moe
Mixture of Experts (MoE)
multimodal
Multi-Modal Input Fusion
neural_ode
Neural ODE / Continuous-Time Models
onnx_export
ONNX model export for kizzasi-model
parallel_multihead
Parallel Multi-Head Computation
profiling
Model Profiling and Benchmarking Utilities
prune
Model Weight Pruning for kizzasi-model
pytorch_compat
PyTorch Checkpoint Compatibility
quantization
Weight Quantization for Efficient Inference
quantize
Post-Training Quantization (PTQ) for kizzasi-model
rwkv
RWKV v6: Receptance Weighted Key Value
rwkv5
RWKV v5: Multi-Head WKV Compatibility Layer
rwkv7
RWKV v7: Next Generation Receptance Weighted Key Value
s4
S4 and S4D: Structured State Space Models
s5
S5: Simplified State Space Model
simd_ops
SIMD-Optimized Operations for Model Inference
speculative
Speculative Decoding for kizzasi-model
spiking
Neuromorphic Spiking Neural Networks (SNN)
state_io
Full model state I/O: saves both weights AND runtime SSM state to disk.
temporal_multiscale
Multi-Scale Temporal Modeling
training
Training Infrastructure for kizzasi-model
training_loop
High-Level Training Loop for kizzasi-model
transformer
Transformer: Standard Multi-Head Attention Baseline
visualization
State Visualization and Attention Pattern Analysis

Macros§

time_op
Time an expression and record the elapsed duration into a registry.

Structs§

HiddenState
Represents the hidden state of the SSM

Enums§

ModelError
Errors that can occur in model operations
ModelType
Enumeration of supported model architectures

Traits§

AutoregressiveModel
Trait for model architectures that support autoregressive prediction
SignalPredictor
Core trait for autoregressive signal prediction

Type Aliases§

Array1
one-dimensional array
Array2
two-dimensional array
CoreResult
Result type alias for core operations
ModelResult
Result type alias for model operations