Crate kizzasi_core

Crate kizzasi_core 

Source
Expand description

§kizzasi-core

Core SSM (State Space Model) engine for Kizzasi AGSP.

Implements linear-time State Space Models (Mamba/S4/RWKV) for efficient processing of continuous signal streams with O(1) inference step complexity.

§COOLJAPAN Ecosystem

This crate follows the KIZZASI_POLICY.md and uses scirs2-core for all array and numerical operations.

Re-exports§

pub use attention::GatedLinearAttention;
pub use attention::MultiHeadSSMAttention;
pub use attention::MultiHeadSSMConfig;
pub use conv::CausalConv1d;
pub use conv::DepthwiseCausalConv1d;
pub use conv::DilatedCausalConv1d;
pub use conv::DilatedStack;
pub use conv::ShortConv;
pub use dataloader::BatchIterator;
pub use dataloader::DataLoaderConfig;
pub use dataloader::TimeSeriesAugmentation;
pub use dataloader::TimeSeriesDataLoader;
pub use device::get_best_device;
pub use device::is_cuda_available;
pub use device::is_metal_available;
pub use device::list_devices;
pub use device::DeviceConfig;
pub use device::DeviceInfo;
pub use device::DeviceType;
pub use efficient_attention::EfficientAttentionConfig;
pub use efficient_attention::EfficientMultiHeadAttention;
pub use efficient_attention::FusedAttentionKernel;
pub use embedded_alloc::BumpAllocator;
pub use embedded_alloc::EmbeddedAllocator;
pub use embedded_alloc::FixedPool;
pub use embedded_alloc::StackAllocator;
pub use embedded_alloc::StackGuard;
pub use flash_attention::flash_attention_fused;
pub use flash_attention::FlashAttention;
pub use flash_attention::FlashAttentionConfig;
pub use gpu_utils::GPUMemoryPool;
pub use gpu_utils::MemoryStats;
pub use gpu_utils::TensorPrefetch;
pub use gpu_utils::TensorTransfer;
pub use gpu_utils::TransferBatch;
pub use h3::DiagonalSSM;
pub use h3::H3Config;
pub use h3::H3Layer;
pub use h3::H3Model;
pub use h3::ShiftSSM;
pub use kernel_fusion::fused_ffn_gelu;
pub use kernel_fusion::fused_layernorm_gelu;
pub use kernel_fusion::fused_layernorm_silu;
pub use kernel_fusion::fused_linear_activation;
pub use kernel_fusion::fused_multihead_output;
pub use kernel_fusion::fused_qkv_projection;
pub use kernel_fusion::fused_quantize_dequantize;
pub use kernel_fusion::fused_softmax_attend;
pub use kernel_fusion::fused_ssm_step;
pub use lora::LoRAAdapter;
pub use lora::LoRAConfig;
pub use lora::LoRALayer;
pub use mamba2::Mamba2Config;
pub use mamba2::Mamba2Layer;
pub use mamba2::Mamba2Model;
pub use metrics::MetricsLogger;
pub use metrics::MetricsSummary;
pub use metrics::TrainingMetrics;
pub use nn::gelu;
pub use nn::gelu_fast;
pub use nn::layer_norm;
pub use nn::leaky_relu;
pub use nn::log_softmax;
pub use nn::relu;
pub use nn::rms_norm;
pub use nn::sigmoid;
pub use nn::silu;
pub use nn::softmax;
pub use nn::tanh;
pub use nn::Activation;
pub use nn::ActivationType;
pub use nn::GatedLinearUnit;
pub use nn::LayerNorm;
pub use nn::NormType;
pub use optimizations::acquire_workspace;
pub use optimizations::ilp;
pub use optimizations::prefetch;
pub use optimizations::release_workspace;
pub use optimizations::CacheAligned;
pub use optimizations::DiscretizationCache;
pub use optimizations::SSMWorkspace;
pub use optimizations::WorkspaceGuard;
pub use parallel::BatchProcessor;
pub use parallel::ParallelConfig;
pub use pool::ArrayPool;
pub use pool::MultiArrayPool;
pub use pool::PoolStats;
pub use pool::PooledArray;
pub use profiling::CounterStats;
pub use profiling::MemoryProfiler;
pub use profiling::PerfCounter;
pub use profiling::ProfilerMemoryStats;
pub use profiling::ProfilingSession;
pub use profiling::ScopeTimer;
pub use profiling::Timer;
pub use pruning::GradientPruner;
pub use pruning::PruningConfig;
pub use pruning::PruningGranularity;
pub use pruning::PruningMask;
pub use pruning::PruningStrategy;
pub use pruning::StructuredPruner;
pub use pytorch_compat::detect_checkpoint_architecture;
pub use pytorch_compat::load_pytorch_checkpoint;
pub use pytorch_compat::PyTorchCheckpoint;
pub use pytorch_compat::PyTorchConverter;
pub use pytorch_compat::WeightMapping;
pub use quantization::DynamicQuantizer;
pub use quantization::QuantizationParams;
pub use quantization::QuantizationScheme;
pub use quantization::QuantizationType;
pub use quantization::QuantizedTensor;
pub use retnet::MultiScaleRetention;
pub use retnet::RetNetConfig;
pub use retnet::RetNetLayer;
pub use retnet::RetNetModel;
pub use rwkv7::ChannelMixing;
pub use rwkv7::RWKV7Config;
pub use rwkv7::RWKV7Layer;
pub use rwkv7::RWKV7Model;
pub use rwkv7::TimeMixing;
pub use s4d::S4DConfig;
pub use s4d::S4DLayer;
pub use s4d::S4DModel;
pub use s5::S5Config;
pub use s5::S5Layer;
pub use s5::S5Model;
pub use scan::parallel_scan;
pub use scan::parallel_ssm_batch;
pub use scan::parallel_ssm_scan;
pub use scan::segmented_scan;
pub use scan::AssociativeOp;
pub use scan::SSMElement;
pub use scan::SSMScanOp;
pub use scheduler::ConstantScheduler;
pub use scheduler::CosineScheduler;
pub use scheduler::ExponentialScheduler;
pub use scheduler::LRScheduler;
pub use scheduler::LinearScheduler;
pub use scheduler::OneCycleScheduler;
pub use scheduler::PolynomialScheduler;
pub use scheduler::StepScheduler;
pub use sequences::apply_mask;
pub use sequences::masked_mean;
pub use sequences::masked_sum;
pub use sequences::pad_sequences;
pub use sequences::PackedSequence;
pub use sequences::PaddingStrategy;
pub use sequences::SequenceMask;
pub use training::CheckpointMetadata;
pub use training::ConstraintLoss;
pub use training::Loss;
pub use training::MixedPrecision;
pub use training::SchedulerType;
pub use training::TrainableSSM;
pub use training::Trainer;
pub use training::TrainingConfig;
pub use weights::WeightFormat;
pub use weights::WeightLoadConfig;
pub use weights::WeightLoader;
pub use weights::WeightPruner;

Modules§

attention
Multi-head SSM Attention mechanisms
conv
Causal convolution implementations for SSM architectures
dataloader
DataLoader for time-series training
device
Device selection and GPU acceleration utilities
efficient_attention
Memory-efficient attention implementations
embedded_alloc
Embedded-friendly allocator for no_std environments
fixed_point
Fixed-Point Arithmetic for Embedded Systems
flash_attention
Flash-Attention-2 Implementation
gpu_utils
GPU memory management and tensor transfer utilities
h3
H3 (Hungry Hungry Hippos) Architecture
kernel_fusion
Fused Kernel Optimizations
lora
LoRA (Low-Rank Adaptation) Support
mamba2
Mamba-2 SSD (State Space Duality)
metrics
Training metrics and logging utilities
nn
Neural network building blocks: normalization and activation functions
numerics
Numerical stability utilities for SSM computations
optimizations
Performance optimizations for kizzasi-core
parallel
Parallel computation utilities for multi-layer SSM processing
pool
Memory pooling for allocation reuse
profiling
Performance profiling utilities for kizzasi-core
pruning
Structured Pruning
pytorch_compat
PyTorch Compatibility Layer
quantization
Dynamic Quantization
retnet
RetNet: Retention Networks for Multi-Scale Sequence Modeling
rwkv7
RWKV-7 Architecture
s5
S5 (Simplified State Space Layers)
s4d
S4D: Diagonal Structured State Space Model
scan
Parallel Scan Algorithms for SSMs
scheduler
Learning rate schedulers for training
sequences
Variable-length Sequence Handling
simd
SIMD-optimized operations for high-performance matrix computations
simd_avx512
AVX-512 SIMD Optimizations
simd_neon
ARM NEON SIMD Optimizations
training
Training infrastructure for SSM models
weights
Weight management for SSM models

Macros§

profile_memory
Macro to profile memory usage of a block
time_block
Macro to time a block of code

Structs§

ContinuousEmbedding
Continuous embedding layer for signal values
HiddenState
Represents the hidden state of the SSM
KizzasiConfig
Configuration for the Kizzasi AGSP engine
SelectiveSSM
Selective State Space Model (Mamba-style)

Enums§

CoreError
Errors that can occur in the core SSM engine
ModelType
Type of state space model to use

Traits§

SignalPredictor
Core trait for autoregressive signal prediction
StateSpaceModel
Trait for state space model implementations

Type Aliases§

Array1
one-dimensional array
CoreResult
Result type alias for core operations