Crate gllm

Crate gllm 

Source
Expand description

gllm: Pure Rust embeddings and rerank library built on Burn.

§Features

  • Embedding: Text-to-vector conversion using BERT-based models
  • Reranking: Cross-encoder based document reranking
  • Runtime Fallback: Automatic GPU→CPU fallback on memory errors
  • Multi-backend: GPU (Wgpu), CPU+BLAS (Candle), Pure Rust (NdArray)

§Quick Start

use gllm::{FallbackEmbedder, Device};

// Create embedder with automatic GPU→CPU fallback
let embedder = FallbackEmbedder::new("bge-small-en").await?;

// Embed text - automatically falls back to CPU if GPU OOMs
let vector = embedder.embed("Hello world").await?;

Re-exports§

pub use generation::FinishReason;
pub use generation::GenerationBuilder;
pub use generation::GenerationConfig;
pub use generation::GenerationOutput;

Modules§

attention
Attention mechanisms for gllm
causal_attention
decoder_layer
decoder_model
distributed
Distributed KV cache and multi-GPU support for ultra-long contexts.
flash_attention
generation
generator_model
kv_cache
moe_decoder_layer
moe_generator_model
moe_layer
rms_norm
sampler
weight_loader
Weight loading module for SafeTensors files.

Structs§

Client
Client for embeddings, reranking, and generation.
ClientConfig
Client configuration options.
EmbedderHandle
Embedding
Single embedding vector.
EmbeddingResponse
Embedding response containing all vectors and usage stats.
EmbeddingsBuilder
Embeddings request builder.
FallbackEmbedder
Embedder with automatic runtime fallback from GPU to CPU.
GpuCapabilities
GPU capabilities detected from the system.
GraphCodeInput
Input for GraphCodeBERT containing source code and data flow graph info.
ModelInfo
Metadata describing a model.
ModelRegistry
Registry of built-in model aliases.
RerankBuilder
Rerank request builder.
RerankResponse
Rerank response containing sorted results.
RerankResult
Single rerank result entry.
RerankerHandle
Usage
Token usage statistics.

Enums§

Architecture
Architecture of a model.
Device
Device selection for inference.
Error
gllm error type.
GpuType
GPU device type classification.
ModelType
Model type supported by the library.
Quantization
Quantization type for models.

Type Aliases§

Result
Result alias for gllm operations.