Expand description
§boostr
ML framework built on numr — attention, quantization, model architectures.
boostr extends numr’s foundational numerical computing with ML-specific operations, quantized tensor support, and model building blocks. It uses numr’s runtime, tensors, and ops directly — no reimplementation, no wrappers.
§Relationship to numr
┌─────────────────────────────────────────────────────────┐
│ boostr ◄── YOU ARE HERE │
│ (attention, RoPE, MoE, quantization, model loaders) │
└──────────────────────────┬──────────────────────────────┘
│ numr │
│ (tensors, ops, runtime, autograd, linalg, FFT) │
└─────────────────────────────────────────────────────────┘§Design
- Extension traits: ML ops (AttentionOps, RoPEOps) implemented on numr’s clients
- QuantTensor: Separate type for block-quantized data (GGUF formats)
- impl_generic: Composite ops composed from numr primitives, same on all backends
- Custom kernels: Dequant, quantized matmul, fused attention (SIMD/PTX/WGSL)
Re-exports§
pub use nn::Init;pub use nn::VarBuilder;pub use nn::VarMap;pub use nn::Weight;pub use ops::AttentionOps;pub use ops::DeviceGrammarDfa;pub use ops::FlashAttentionOps;pub use ops::FusedFp8TrainingOps;pub use ops::FusedOptimizerOps;pub use ops::FusedQkvOps;pub use ops::GrammarDfaOps;pub use ops::KvCacheOps;pub use ops::MlaOps;pub use ops::PagedAttentionOps;pub use ops::RoPEOps;pub use ops::SamplingOps;pub use ops::var_flash_attention;pub use quant::DecomposedQuantLinear;pub use quant::DecomposedQuantMethod;pub use quant::DecomposedQuantTensor;pub use quant::DequantOps;pub use quant::FusedQuantOps;pub use quant::QuantFormat;pub use quant::QuantMatmulOps;pub use quant::QuantTensor;pub use model::ExpertWeights;pub use format::GgufTokenizer;pub use model::encoder::EmbeddingPipeline;pub use model::encoder::Encoder;pub use model::encoder::EncoderClient;pub use model::encoder::EncoderConfig;pub use model::encoder::Pooling;
Modules§
- autograd
- Automatic differentiation (autograd)
- data
- distributed
- error
- boostr error types
- format
- inference
- model
- nn
- ops
- optimizer
- quant
- runtime
- Runtime backends for tensor computation
- tensor
- Tensor types and operations
- trainer
Structs§
- CpuClient
- CPU client for operation dispatch
- CpuDevice
- CPU device (there’s only one: the host CPU)
- CpuRuntime
- CPU compute runtime
- Tensor
- N-dimensional array stored on a compute device
Enums§
Traits§
- Activation
Ops - Activation operations
- Binary
Ops - Element-wise binary operations on tensors.
- ConvOps
- Convolution operations.
- Indexing
Ops - Indexing operations
- Normalization
Ops - Normalization operations
- Runtime
- Core trait for compute backends
- Runtime
Client - Trait for runtime clients that handle operation dispatch
- Scalar
Ops - Scalar operations trait for tensor-scalar operations
- Tensor
Ops - Core tensor operations trait
- Type
Conversion Ops - Type conversion operations
- Unary
Ops - Unary operations
Type Aliases§
- Numr
Result - Result type alias using numr’s Error