Skip to main content

Crate cortex_rust

Crate cortex_rust

Expand description

Cortex Rust Engine

Core implementation of the Bit-Llama model with TTT (Test-Time Training) support. Provides native Rust, Python, and WebAssembly bindings.

Re-exports§

pub use eval::compute_perplexity;
pub use eval::PerplexityResult;
pub use layers::BitLinear;
pub use layers::Linear4Bit;
pub use layers::RMSNorm;
pub use layers::SwiGLU;
pub use layers::TTTLayer;
pub use model::Llama;
pub use model::defaults;
pub use model::ModelConfig;
pub use model::ActivationType;
pub use model::BitLlama;
pub use model::BitLlamaBlock;
pub use model::BitLlamaConfig;
pub use model::GgufLoader;
pub use model::GgufModel;
pub use model::GgufTensorInfo;
pub use model::LayerDispatch;
pub use model::ModelArch;
pub use model::ModelType;
pub use model::UnifiedModel;
pub use model::TTTLayer as CandleTTTLayer;

Modules§

device_utils: Multi-GPU Detection and Management Utilities
download: Fast Parallel Downloader with Resume Support
error: Unified Error Types for Bit-TTT-Engine (統一エラー型)
eval: Evaluation utilities for language models.
kernels
layers: Layers Module - Core neural network layers
model: Model Module - BitLlama model architecture
optim
paged_attention: PagedAttention - Efficient KV Cache Management
python: Python Bindings for BitLlama (PyO3)
scheduler: Scheduler - Continuous Batching Request Scheduler
speculative: Speculative Decoding - Accelerated Token Generation

Functions§

infer