Skip to main content

Crate smg_grpc_client

Crate smg_grpc_client 

Source
Expand description

gRPC clients for vLLM, TensorRT-LLM, MLX, TokenSpeed, and SGLang backends.

This crate provides gRPC client implementations for communicating with the vLLM engine, TensorRT-LLM engine, MLX engine, TokenSpeed scheduler, and SGLang scheduler backends.

Re-exports§

pub use abort_on_drop::AbortOnDropClient;
pub use abort_on_drop::AbortOnDropStream;
pub use channel::connect_channel;
pub use channel::normalize_grpc_endpoint;
pub use mlx_engine::proto as mlx_proto;
pub use mlx_engine::MlxEngineClient;
pub use sglang_scheduler::proto as sglang_proto;
pub use sglang_scheduler::SglangSchedulerClient;
pub use tokenspeed_scheduler::tokenspeed_proto;
pub use tokenspeed_scheduler::TokenSpeedSchedulerClient;
pub use trtllm_service::proto as trtllm_proto;
pub use trtllm_service::TrtllmServiceClient;
pub use vllm_engine::proto as vllm_proto;
pub use vllm_engine::VllmEngineClient;

Modules§

abort_on_drop
Generic abort-on-drop wrapper for engine streaming responses.
channel
Shared tonic::Channel builder for SMG gRPC clients.
common_proto
mlx_engine
sglang_scheduler
tokenizer_bundle
Tokenizer bundle streaming, validation, and extraction.
tokenspeed_scheduler
gRPC client for the TokenSpeed scheduler service.
trtllm_service
vllm_engine

Structs§

NoopTraceInjector
A no-op trace injector that does nothing.

Constants§

FLUSH_RPC_DEADLINE_MARGIN
Extra local-deadline margin for flush_cache on top of the timeout forwarded to the backend. The servicer bounds its own scheduler round-trip at max(30, timeout_s + 10) seconds, so the margin must cover that budget plus transport overhead.
PROFILE_RPC_DEADLINE
Local deadline for profile start/stop RPCs. Stopping a profile can take a long time while the backend serializes large traces.

Traits§

TraceInjector
Trait for injecting trace context into gRPC metadata.

Type Aliases§

BoxedTraceInjector
Type alias for a boxed trace injector.