docs.rs failed to build lmonade-runtime-0.1.0-alpha.2
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build:
lmonade-runtime-0.1.0-alpha.1
lmonade-runtime
High-performance actor-based runtime for LLM inference orchestration.
Overview
This crate provides the core runtime infrastructure for Lmonade:
- Actor-based model management using Tokio
- Efficient request routing and scheduling
- Resource orchestration (GPU/CPU allocation)
- Model lifecycle management (loading, caching, eviction)
- Concurrent inference with automatic batching
Key Components
- Actor System: Erlang-inspired actor model for fault tolerance (
src/actor/
) - Model Hub: Central orchestrator for multi-model serving (
src/actor/model_hub.rs
) - Resource Management: GPU allocation and memory management (
src/resources/
) - LLM Engine: Core inference engine integration (
src/llm_engine.rs
) - Model Runner: Execution layer for model inference (
src/model_runner.rs
)
Architecture
The runtime uses an actor-based architecture where:
- Each model runs in its own actor with isolated state
- The ModelHub orchestrates routing and resource allocation
- Supervisors provide fault tolerance and recovery
- Message passing ensures thread-safe communication
Usage
use ModelHub;
use ;
// Create and start the model hub
let hub = new.await?;
// Load a model
hub.load_model.await?;
// Generate text
let response = hub.generate.await?;
Documentation
For detailed documentation:
Status
The runtime is under active development with basic TinyLlama inference working. Production features like distributed serving and advanced scheduling are in progress.