Expand description
gllm: Pure Rust embeddings and rerank library built on Burn.
§Features
- Embedding: Text-to-vector conversion using BERT-based models
- Reranking: Cross-encoder based document reranking
- Runtime Fallback: Automatic GPU→CPU fallback on memory errors
- Multi-backend: GPU (Wgpu), CPU+BLAS (Candle), Pure Rust (NdArray)
§Quick Start
ⓘ
use gllm::{FallbackEmbedder, Device};
// Create embedder with automatic GPU→CPU fallback
let embedder = FallbackEmbedder::new("bge-small-en").await?;
// Embed text - automatically falls back to CPU if GPU OOMs
let vector = embedder.embed("Hello world").await?;Re-exports§
pub use generation::FinishReason;pub use generation::GenerationBuilder;pub use generation::GenerationConfig;pub use generation::GenerationOutput;
Modules§
- attention
- Attention mechanisms for gllm
- causal_
attention - decoder_
layer - decoder_
model - distributed
- Distributed KV cache and multi-GPU support for ultra-long contexts.
- flash_
attention - generation
- generator_
model - kv_
cache - moe_
decoder_ layer - moe_
generator_ model - moe_
layer - rms_
norm - sampler
- weight_
loader - Weight loading module for SafeTensors files.
Structs§
- Client
- Client for embeddings, reranking, and generation.
- Client
Config - Client configuration options.
- Embedder
Handle - Embedding
- Single embedding vector.
- Embedding
Response - Embedding response containing all vectors and usage stats.
- Embeddings
Builder - Embeddings request builder.
- Fallback
Embedder - Embedder with automatic runtime fallback from GPU to CPU.
- GpuCapabilities
- GPU capabilities detected from the system.
- Graph
Code Input - Input for GraphCodeBERT containing source code and data flow graph info.
- Model
Info - Metadata describing a model.
- Model
Registry - Registry of built-in model aliases.
- Rerank
Builder - Rerank request builder.
- Rerank
Response - Rerank response containing sorted results.
- Rerank
Result - Single rerank result entry.
- Reranker
Handle - Usage
- Token usage statistics.
Enums§
- Architecture
- Architecture of a model.
- Device
- Device selection for inference.
- Error
- gllm error type.
- GpuType
- GPU device type classification.
- Model
Type - Model type supported by the library.
- Quantization
- Quantization type for models.
Type Aliases§
- Result
- Result alias for gllm operations.