Expand description
gRPC clients for vLLM, TensorRT-LLM, MLX, TokenSpeed, and SGLang backends.
This crate provides gRPC client implementations for communicating with the vLLM engine, TensorRT-LLM engine, MLX engine, TokenSpeed scheduler, and SGLang scheduler backends.
Re-exports§
pub use abort_on_drop::AbortOnDropClient;pub use abort_on_drop::AbortOnDropStream;pub use channel::connect_channel;pub use channel::normalize_grpc_endpoint;pub use mlx_engine::proto as mlx_proto;pub use mlx_engine::MlxEngineClient;pub use sglang_scheduler::proto as sglang_proto;pub use sglang_scheduler::SglangSchedulerClient;pub use tokenspeed_scheduler::tokenspeed_proto;pub use tokenspeed_scheduler::TokenSpeedSchedulerClient;pub use trtllm_service::proto as trtllm_proto;pub use trtllm_service::TrtllmServiceClient;pub use vllm_engine::proto as vllm_proto;pub use vllm_engine::VllmEngineClient;
Modules§
- abort_
on_ drop - Generic abort-on-drop wrapper for engine streaming responses.
- channel
- Shared
tonic::Channelbuilder for SMG gRPC clients. - common_
proto - mlx_
engine - sglang_
scheduler - tokenizer_
bundle - Tokenizer bundle streaming, validation, and extraction.
- tokenspeed_
scheduler - gRPC client for the TokenSpeed scheduler service.
- trtllm_
service - vllm_
engine
Structs§
- Noop
Trace Injector - A no-op trace injector that does nothing.
Constants§
- FLUSH_
RPC_ DEADLINE_ MARGIN - Extra local-deadline margin for
flush_cacheon top of the timeout forwarded to the backend. The servicer bounds its own scheduler round-trip atmax(30, timeout_s + 10)seconds, so the margin must cover that budget plus transport overhead. - PROFILE_
RPC_ DEADLINE - Local deadline for profile start/stop RPCs. Stopping a profile can take a long time while the backend serializes large traces.
Traits§
- Trace
Injector - Trait for injecting trace context into gRPC metadata.
Type Aliases§
- Boxed
Trace Injector - Type alias for a boxed trace injector.