ruvllm 2.0.1

LLM serving runtime with Ruvector integration - Paged attention, KV cache, and SONA learning
Documentation

ruvllm

There is very little structured metadata to build this page from currently. You should check the main library docs, readme, or Cargo.toml in case the author documented the features in them.

This version has 30 feature flags, 9 of them enabled by default.

default

async-runtime (default)

candle (default)

tokio (default)

tokio-stream (default)

candle-core (default)

candle-nn (default)

candle-transformers (default)

hf-hub (default)

tokenizers (default)

accelerate

This feature flag does not enable additional features.

attention

coreml

cuda

gguf-mmap

gnn

graph

hybrid-ane

inference-cuda

inference-metal

inference-metal-native

metal

metal-compute

minimal

mmap

parallel

rlm-core

This feature flag does not enable additional features.

rlm-full

rlm-wasm

ruvector-full

wasm

This feature flag does not enable additional features.