Expand description
Cortex Rust Engine
Core implementation of the Bit-Llama model with TTT (Test-Time Training) support. Provides native Rust, Python, and WebAssembly bindings.
Re-exports§
pub use eval::compute_perplexity;pub use eval::PerplexityResult;pub use layers::BitLinear;pub use layers::Linear4Bit;pub use layers::RMSNorm;pub use layers::SwiGLU;pub use layers::TTTLayer;pub use model::Llama;pub use model::defaults;pub use model::ModelConfig;pub use model::ActivationType;pub use model::BitLlama;pub use model::BitLlamaBlock;pub use model::BitLlamaConfig;pub use model::GgufLoader;pub use model::GgufModel;pub use model::GgufTensorInfo;pub use model::LayerDispatch;pub use model::ModelArch;pub use model::ModelType;pub use model::UnifiedModel;pub use model::TTTLayer as CandleTTTLayer;
Modules§
- device_
utils - Multi-GPU Detection and Management Utilities
- download
- Fast Parallel Downloader with Resume Support
- error
- Unified Error Types for Bit-TTT-Engine (統一エラー型)
- eval
- Evaluation utilities for language models.
- kernels
- layers
- Layers Module - Core neural network layers
- model
- Model Module - BitLlama model architecture
- optim
- paged_
attention - PagedAttention - Efficient KV Cache Management
- python
- Python Bindings for BitLlama (PyO3)
- scheduler
- Scheduler - Continuous Batching Request Scheduler
- speculative
- Speculative Decoding - Accelerated Token Generation