gllm 0.10.6

Pure Rust library for local embeddings, reranking, and text generation with MoE-optimized inference and aggressive performance tuning
Documentation

gllm

There is very little structured metadata to build this page from currently. You should check the main library docs, readme, or Cargo.toml in case the author documented the features in them.

This version has 9 feature flags, 1 of them enabled by default.

default

cpu (default)

cuda

flash-attention

This feature flag does not enable additional features.

gpu-quantized

paged-attention

This feature flag does not enable additional features.

quantized

This feature flag does not enable additional features.

tokio

wgpu

wgpu-detect