Expand description
§kizzasi-model
Model architectures for Kizzasi AGSP (Autoregressive General-Purpose Signal Predictor).
This crate implements various State Space Model architectures optimized for continuous signal prediction with O(1) inference step complexity:
- Mamba/Mamba2: Selective State Space Models with input-dependent dynamics
- RWKV: Linear attention with time-mixing and channel-mixing
- S4/S4D: Structured State Space Models with diagonal state matrices
- Transformer: Standard attention for comparison (O(N) per step)
§COOLJAPAN Ecosystem
This crate follows KIZZASI_POLICY.md and uses scirs2-core for all
array and numerical operations.
§Architecture Philosophy
As described in the AGSP concept, these models treat all signals (audio, video, sensors, actions) as equivalent tokenized sequences, enabling cross-modal prediction and world model construction.
Re-exports§
pub use arch_search::search_best_arch;pub use arch_search::ArchCandidate;pub use arch_search::ArchSearchConfig;pub use arch_search::ArchSearchResult;pub use arch_search::ArchSearchSpace;pub use arch_search::EvolutionarySearcher;pub use arch_search::GridSearcher;pub use arch_search::RandomArchSearcher;pub use backprop::layer_norm_backward;pub use backprop::linear_backward;pub use backprop::silu_backward;pub use backprop::softmax_backward;pub use backprop::GradAccumulator;pub use backprop::GradientTape;pub use backprop::SsmBackward;pub use backprop::SsmGradients;pub use backprop::Tensor;pub use gguf::GgufFile;pub use gguf::GgufInspection;pub use gguf::GgufMetaValue;pub use gguf::GgufQuantType;pub use gguf::GgufTensorInfo;pub use incremental_loader::GgufFileSource;pub use incremental_loader::IncrementalModelLoader;pub use incremental_loader::SafeTensorsSource;pub use incremental_loader::WeightSource;pub use loader::ModelLoader;pub use loader::TensorInfo;pub use loader::WeightLoader;pub use lora::LoraAdapter;pub use lora::LoraAdapterSummary;pub use lora::LoraConfig;pub use lora::LoraLinear;pub use lora::QLoraLinear;pub use multimodal::FusionStrategy;pub use multimodal::Modality;pub use multimodal::ModalityAligner;pub use multimodal::MultiModalConfig;pub use multimodal::MultiModalModel;pub use neural_ode::AugmentedNeuralOde;pub use neural_ode::NeuralOdeConfig;pub use neural_ode::NeuralOdeModel;pub use neural_ode::OdeIntegrator;pub use neural_ode::OdeSolver;pub use spiking::LifLayer;pub use spiking::MembranePotential;pub use spiking::ResetMode;pub use spiking::SpikingConfig;pub use spiking::SpikingNeuralNetwork;pub use spiking::StdpConfig;pub use temporal_multiscale::MultiScaleConfig;pub use temporal_multiscale::MultiScaleModel;pub use temporal_multiscale::ScaleFusion;pub use training_loop::AdamOptimizer;pub use training_loop::ArrayDataProvider;pub use training_loop::ConstantScheduler;pub use training_loop::DataProvider;pub use training_loop::ExponentialScheduler;pub use training_loop::LrScheduler;pub use training_loop::Optimizer;pub use training_loop::SgdOptimizer;pub use training_loop::StepDecayScheduler;pub use training_loop::TrainingCallback;pub use training_loop::TrainingConfig;pub use training_loop::TrainingLoop;pub use training_loop::TrainingResult;pub use distributed::average_gradients;pub use distributed::partition_indices;pub use distributed::run_parallel_workers;pub use distributed::sgd_step;pub use distributed::CommBackend;pub use distributed::DataParallelModel;pub use distributed::DistributedConfig;pub use distributed::GradientBuffer;pub use distributed::GradientStrategy;pub use distributed::GradientSync;pub use distributed::LocalGradientSync;pub use distributed::ThreadedGradientSync;pub use curriculum::CurriculumDataProvider;pub use curriculum::CurriculumScheduler;pub use curriculum::CurriculumStrategy;pub use gradient_checkpoint::ActivationCheckpointer;pub use gradient_checkpoint::CheckpointConfig;pub use speculative::SpeculativeDecoder;pub use speculative::SpeculativeResult;pub use early_exit::AdaptiveComputation;pub use early_exit::EarlyExitConfig;pub use early_exit::ExitCriterion;pub use early_exit::ExitStats;pub use blas_ops::axpy;pub use blas_ops::batch_matmul_vec;pub use blas_ops::dot;pub use blas_ops::matmul_mat;pub use blas_ops::matmul_vec;pub use blas_ops::norm_frobenius;pub use blas_ops::norm_l2;pub use blas_ops::transpose;pub use blas_ops::BlasConfig;pub use profiling::BottleneckInfo;pub use profiling::BottleneckSeverity;pub use profiling::ComprehensiveComparison;pub use profiling::ComprehensiveProfiler;pub use profiling::ModelBottleneckAnalysis;pub use interpretability::ActivationStats;pub use interpretability::CompressionAnalysis;pub use interpretability::GatingAnalysis;pub use interpretability::InterpretabilityReport;pub use interpretability::LayerProbe;pub use interpretability::SensitivityAnalyzer;pub use interpretability::StateTrajectory;pub use visualization::matrix_to_csv;pub use visualization::signal_to_svg_sparkline;pub use visualization::ActivationHistogram;pub use visualization::GatingPatternRecorder;pub use visualization::PhasePortrait;pub use compression::CompressionReport;pub use compression::LowRankApprox;pub use compression::MagnitudePruner;pub use compression::StructuredPruner;pub use state_io::decode_f32_slice;pub use state_io::encode_f32_slice;pub use state_io::ModelSnapshot;pub use rwkv5::Rwkv5Config;pub use rwkv5::Rwkv5Model;pub use rwkv5::Rwkv5State;pub use rwkv7::Rwkv7Config;pub use rwkv7::Rwkv7Model;pub use rwkv7::Rwkv7State;pub use rwkv7::Rwkv7TimeMixing;
Modules§
- arch_
search - Neural Architecture Search (NAS) for SSM Hyperparameters
- backprop
- Backward pass and gradient computation infrastructure for kizzasi-model.
- backprop_
ssm - SSM-specific backward pass types and free functions.
- batch
- Batched Inference Support
- blas_
ops - BLAS-Accelerated Operations
- cache_
friendly - Cache-Friendly Memory Layouts
- checkpoint
- Checkpointing and Training Utilities
- compression
- Model Compression Utilities
- curriculum
- Curriculum Learning for kizzasi-model
- distributed
- Distributed Training Support for kizzasi-model
- dynamic_
quantization - Dynamic Quantization for On-the-Fly Model Compression
- early_
exit - Adaptive Early Exit for kizzasi-model
- factory
- Model Factory for Instantiating Models from Loaded Weights
- flash_
linear_ attn - Flash Linear Attention — chunk-wise O(n) memory linear attention
- gguf
- GGUF Format Support
- gradient_
checkpoint - Gradient Checkpointing for Memory-Efficient Training
- h3
- H3: Hungry Hungry Hippos
- huggingface
- HuggingFace Hub Integration
- huggingface_
loader - HuggingFace Model Loading and Weight Conversion
- hybrid
- Hybrid Mamba+Attention Model
- incremental_
loader - Incremental / streaming weight loading for large model (7B+) support.
- interpretability
- Model Interpretability Tools
- loader
- Weight loading from safetensors format
- lora
- LoRA (Low-Rank Adaptation) for efficient fine-tuning
- mamba
- Mamba: Selective State Space Model
- mamba2
- Mamba2: Enhanced Selective State Space Model with State Space Duality (SSD)
- mixed_
precision - Mixed Precision Support (FP16/BF16)
- moe
- Mixture of Experts (MoE)
- multimodal
- Multi-Modal Input Fusion
- neural_
ode - Neural ODE / Continuous-Time Models
- onnx_
export - ONNX model export for kizzasi-model
- parallel_
multihead - Parallel Multi-Head Computation
- profiling
- Model Profiling and Benchmarking Utilities
- prune
- Model Weight Pruning for kizzasi-model
- pytorch_
compat - PyTorch Checkpoint Compatibility
- quantization
- Weight Quantization for Efficient Inference
- quantize
- Post-Training Quantization (PTQ) for kizzasi-model
- rwkv
- RWKV v6: Receptance Weighted Key Value
- rwkv5
- RWKV v5: Multi-Head WKV Compatibility Layer
- rwkv7
- RWKV v7: Next Generation Receptance Weighted Key Value
- s4
- S4 and S4D: Structured State Space Models
- s5
- S5: Simplified State Space Model
- simd_
ops - SIMD-Optimized Operations for Model Inference
- speculative
- Speculative Decoding for kizzasi-model
- spiking
- Neuromorphic Spiking Neural Networks (SNN)
- state_
io - Full model state I/O: saves both weights AND runtime SSM state to disk.
- temporal_
multiscale - Multi-Scale Temporal Modeling
- training
- Training Infrastructure for kizzasi-model
- training_
loop - High-Level Training Loop for kizzasi-model
- transformer
- Transformer: Standard Multi-Head Attention Baseline
- visualization
- State Visualization and Attention Pattern Analysis
Macros§
- time_op
- Time an expression and record the elapsed duration into a registry.
Structs§
- Hidden
State - Represents the hidden state of the SSM
Enums§
- Model
Error - Errors that can occur in model operations
- Model
Type - Enumeration of supported model architectures
Traits§
- Autoregressive
Model - Trait for model architectures that support autoregressive prediction
- Signal
Predictor - Core trait for autoregressive signal prediction
Type Aliases§
- Array1
- one-dimensional array
- Array2
- two-dimensional array
- Core
Result - Result type alias for core operations
- Model
Result - Result type alias for model operations