Crate atomr_infer_pipeline

Expand description

§inference-pipeline

atomr-streams integration for inference graphs (doc §9), plus a re-export shim over atomr-accel-patterns so callers get the upstream universal-GPU blueprints (batching, cascade, replica pool, fair-share scheduler, hot-swap, MoE router, speculative decoder) without taking a second dependency.

The patterns are runtime-agnostic: they accept user-supplied closures / trait impls as the backend, so an inference deployment plugs in by handing them a closure that calls into a Box<dyn ModelRunner>. That avoids reimplementing any of the patterns locally — they’re the §9 building blocks the doc names.

Re-exports are gated behind the cuda-patterns feature so inference --features remote-only builds don’t pull cudarc.

Structs§

HybridGraph: Reference hybrid-graph descriptor. Pure metadata; the instantiation lives in caller code (the examples/remote_only_demo crate exercises one path). When the cuda-patterns feature is on, callers turn the descriptor into an InferenceCascade by handing each deployment name to a CascadeStage whose closure looks the ActorRef up in the cluster.

Functions§

request_source: Adapter — accept a tokio::mpsc receiver and emit it as a stream Source. The caller owns the sender and is responsible for closing it to terminate the stream.