forgellm-runtime 0.1.0

Minimal runtime for ForgeLLM (KV cache, sampling, tokenizer, API server)
Documentation

Forge Runtime — minimal inference runtime.

Provides KV cache management, token sampling, tokenizer integration, and an OpenAI-compatible API server.