Crate gllm

Expand description

gllm: Pure Rust embeddings and rerank library built on Burn.

§Features

Embedding: Text-to-vector conversion using BERT-based models
Reranking: Cross-encoder based document reranking
Runtime Fallback: Automatic GPU→CPU fallback on memory errors
Multi-backend: GPU (Wgpu), CPU+BLAS (Candle), Pure Rust (NdArray)

§Quick Start

use gllm::{FallbackEmbedder, Device};

// Create embedder with automatic GPU→CPU fallback
let embedder = FallbackEmbedder::new("bge-small-en").await?;

// Embed text - automatically falls back to CPU if GPU OOMs
let vector = embedder.embed("Hello world").await?;

Re-exports§

pub use generation::FinishReason;
pub use generation::GenerationBuilder;
pub use generation::GenerationConfig;
pub use generation::GenerationOutput;

Modules§

attention: Attention mechanisms for gllm
causal_attention
decoder_layer
decoder_model
distributed: Distributed KV cache and multi-GPU support for ultra-long contexts.
flash_attention
generation
generator_model
kv_cache
moe_decoder_layer
moe_generator_model
moe_layer
rms_norm
sampler
weight_loader: Weight loading module for SafeTensors files.

Structs§

Client: Client for embeddings, reranking, and generation.
ClientConfig: Client configuration options.
EmbedderHandle
Embedding: Single embedding vector.
EmbeddingResponse: Embedding response containing all vectors and usage stats.
EmbeddingsBuilder: Embeddings request builder.
FallbackEmbedder: Embedder with automatic runtime fallback from GPU to CPU.
GpuCapabilities: GPU capabilities detected from the system.
GraphCodeInput: Input for GraphCodeBERT containing source code and data flow graph info.
ModelInfo: Metadata describing a model.
ModelRegistry: Registry of built-in model aliases.
RerankBuilder: Rerank request builder.
RerankResponse: Rerank response containing sorted results.
RerankResult: Single rerank result entry.
RerankerHandle
Usage: Token usage statistics.

Enums§

Architecture: Architecture of a model.
Device: Device selection for inference.
Error: gllm error type.
GpuType: GPU device type classification.
ModelType: Model type supported by the library.
Quantization: Quantization type for models.

Type Aliases§

Result: Result alias for gllm operations.

Crate gllm

Crate gllm Copy item path

§Features

§Quick Start

Re-exports§

Modules§

Structs§

Enums§

Type Aliases§

Crate gllm