Expand description
§inference-pipeline
atomr-streams integration for inference graphs (doc §9), plus a
re-export shim over atomr-accel-patterns so callers get the
upstream universal-GPU blueprints (batching, cascade, replica pool,
fair-share scheduler, hot-swap, MoE router, speculative decoder)
without taking a second dependency.
The patterns are runtime-agnostic: they accept user-supplied
closures / trait impls as the backend, so an inference deployment
plugs in by handing them a closure that calls into a
Box<dyn ModelRunner>. That avoids reimplementing any of the
patterns locally — they’re the §9 building blocks the doc names.
Re-exports are gated behind the cuda-patterns feature so
inference --features remote-only builds don’t pull cudarc.
Structs§
- Hybrid
Graph - Reference hybrid-graph descriptor. Pure metadata; the
instantiation lives in caller code (the
examples/remote_only_democrate exercises one path). When thecuda-patternsfeature is on, callers turn the descriptor into anInferenceCascadeby handing each deployment name to aCascadeStagewhose closure looks theActorRefup in the cluster.
Functions§
- request_
source - Adapter — accept a
tokio::mpscreceiver and emit it as a streamSource. The caller owns the sender and is responsible for closing it to terminate the stream.