apr-cli
CLI tool for APR model inspection, debugging, and operations.
Installation
This installs the apr binary.
Features
- Model Inspection: View APR model structure, metadata, and weights
- Debugging: Hex dumps, tree visualization, flow analysis
- Operations: List, compare, and validate APR models
- TUI Mode: Interactive terminal interface for model exploration
Usage
# Show help
# Inspect a model
# List models in directory
# Interactive TUI mode
# Compare two models
Chat Interface
Interactive chat with language models (supports APR, GGUF, SafeTensors):
# Chat with a GGUF model (GPU acceleration by default)
# Force CPU inference
# Explicitly request GPU acceleration
# Adjust generation parameters
Optional Features
Inference Server
Enable the inference feature to serve models via HTTP:
The server provides an OpenAI-compatible API:
# Health check
# Chat completions
# Streaming
Debugging with Tracing
Use the X-Trace-Level header to enable inference tracing for debugging:
# Brick-level tracing (token operations)
# Step-level tracing (forward pass steps)
# Layer-level tracing (per-layer timing)
Trace levels:
brick: Token-by-token operation timingstep: Forward pass steps (embed, attention, mlp, lm_head)layer: Per-layer timing breakdown (24+ layers)
CUDA GPU Acceleration
Enable CUDA support for NVIDIA GPUs:
Examples
# Run the tracing example
License
MIT