Skip to main content

Crate cortex_rust

Crate cortex_rust 

Source
Expand description

Cortex Rust Engine

Core implementation of the Bit-Llama model with TTT (Test-Time Training) support. Provides native Rust, Python, and WebAssembly bindings.

Re-exports§

pub use eval::compute_perplexity;
pub use eval::PerplexityResult;
pub use layers::BitLinear;
pub use layers::Linear4Bit;
pub use layers::RMSNorm;
pub use layers::SwiGLU;
pub use layers::TTTLayer;
pub use model::Llama;
pub use model::defaults;
pub use model::ModelConfig;
pub use model::ActivationType;
pub use model::BitLlama;
pub use model::BitLlamaBlock;
pub use model::BitLlamaConfig;
pub use model::GgufLoader;
pub use model::GgufModel;
pub use model::GgufTensorInfo;
pub use model::LayerDispatch;
pub use model::ModelArch;
pub use model::ModelType;
pub use model::UnifiedModel;
pub use model::TTTLayer as CandleTTTLayer;

Modules§

device_utils
Multi-GPU Detection and Management Utilities
download
Fast Parallel Downloader with Resume Support
error
Unified Error Types for Bit-TTT-Engine (統一エラー型)
eval
Evaluation utilities for language models.
kernels
layers
Layers Module - Core neural network layers
model
Model Module - BitLlama model architecture
optim
paged_attention
PagedAttention - Efficient KV Cache Management
python
Python Bindings for BitLlama (PyO3)
scheduler
Scheduler - Continuous Batching Request Scheduler
speculative
Speculative Decoding - Accelerated Token Generation

Functions§

infer