Module glm5

Expand description

GLM-5 FP8 provider for Vast.ai serverless deployments

This provider connects to a self-hosted vLLM endpoint running GLM-5 FP8 on Vast.ai serverless infrastructure. The endpoint exposes an OpenAI-compatible chat completions API (vLLM default).

§Model: GLM-5-FP8

Architecture: 744B parameter Mixture-of-Experts (MoE)
Active Parameters: 40B per forward pass
Quantization: FP8 for efficient inference
Hardware: 8x A100 SXM4 80GB
Features: MTP speculative decoding enabled

§Configuration (Vault)

Store under secret/data/codetether/providers/glm5:

{
  "api_key": "<vast-endpoint-api-key>",
  "base_url": "https://route.vast.ai/<endpoint-id>/<api-key>/v1",
  "extra": {
    "model_name": "glm-5-fp8"
  }
}

§Model reference format

Use glm5/glm-5-fp8, glm5/glm-5, or just glm-5-fp8 as the model string.

§Environment variable fallback

GLM5_API_KEY — API key for the Vast.ai endpoint
GLM5_BASE_URL — Base URL of the vLLM endpoint (required)
GLM5_MODEL — Model name override (default: glm-5-fp8)

Structs§

Glm5Provider: GLM-5 FP8 provider targeting a Vast.ai vLLM serverless endpoint.

Constants§

DEFAULT_MODEL: Default model name served by the Vast.ai vLLM endpoint.

Module glm5

Module glm5 Copy item path

§Model: GLM-5-FP8

§Configuration (Vault)

§Model reference format

§Environment variable fallback

Structs§

Constants§

Module glm5