Expand description
EngineCoreActor — local-GPU per-replica orchestrator. Doc §4, §5.1.
Wraps a Box<dyn ModelRunner> whose transport_kind() == LocalGpu. The continuous-batch scheduler and KV-cache manager are
per-runtime modules (vLLM has them; TensorRT/ORT batch by
stacking inputs); this actor just owns the runner, dispatches
ExecuteBatch requests through it, and pumps the resulting chunk
stream into the per-request output channel.
RemoteEngineCoreActor (in inference-remote-core) is the
network-shaped sibling.