Expand description
Rust client for the inferd local-inference daemon.
Wire protocol is NDJSON over Unix socket / Windows named pipe /
loopback TCP. Spec is frozen as protocol v1; see the inferd
repository’s docs/protocol-v1.md.
Two patterns for waiting on the daemon to come up; pick based on whether you need progress UX:
- Pattern A (passive) —
dial_and_wait_readyretries connect against the inference transport with exponential backoff. Successful connect is the ready signal because the daemon’s inference socket only exists when the backend is ready (THREAT_MODEL F-13 in the upstream repo). Standard Postgres/Redis/etcd client shape. - Pattern B (active) —
AdminClientsubscribes to the admin socket and yields lifecycle events (starting/loading_model/ready/restarting/draining). Use this for installer GUIs, dashboards, or middleware that wants to display download progress during first-boot bootstrap.
§Quickstart (v1)
use inferd_client::{Client, Request, Message, Role, Response};
use tokio_stream::StreamExt;
let mut client = inferd_client::dial_and_wait_ready(
std::time::Duration::from_secs(30),
|| Client::dial_tcp("127.0.0.1:47321"),
)
.await?;
let mut stream = client.generate(Request {
id: "demo-1".into(),
messages: vec![Message {
role: Role::User,
content: "hello".into(),
}],
..Default::default()
})
.await?;
while let Some(frame) = stream.next().await {
match frame? {
Response::Token { content, .. } => print!("{content}"),
Response::Done { stop_reason, backend, .. } => {
println!("\n[done; backend={backend}, stop={stop_reason:?}]");
}
Response::Error { code, message, .. } => {
eprintln!("[error {code:?}: {message}]");
}
Response::Status { .. } => {}
}
}§Quickstart (v2 — typed content blocks, attachments, tools)
v2 lives on a separate socket from v1 per ADR 0015. Use
ClientV2 with dial_v2_* instead of dial_tcp/dial_uds and
the v2 wire types (RequestV2, ContentBlock, …).
use inferd_client::{ClientV2, RequestV2, MessageV2, RoleV2, ContentBlock, ResponseV2, ResponseBlock};
use tokio_stream::StreamExt;
let mut client = inferd_client::dial_and_wait_ready(
std::time::Duration::from_secs(30),
|| ClientV2::dial_tcp("127.0.0.1:47322"),
)
.await?;
let mut stream = client.generate(RequestV2 {
id: "demo-1".into(),
messages: vec![MessageV2 {
role: RoleV2::User,
content: vec![ContentBlock::Text { text: "hello".into() }],
}],
..Default::default()
})
.await?;
while let Some(frame) = stream.next().await {
match frame? {
ResponseV2::Frame { block: ResponseBlock::Text { delta }, .. } => print!("{delta}"),
ResponseV2::Frame { block: ResponseBlock::Thinking { .. }, .. } => {}
ResponseV2::Frame { block: ResponseBlock::ToolUse { name, .. }, .. } => {
println!("\n[tool_use: {name}]");
}
ResponseV2::Done { stop_reason, backend, .. } => {
println!("\n[done; backend={backend}, stop={stop_reason:?}]");
}
ResponseV2::Error { code, message, .. } => {
eprintln!("[error {code:?}: {message}]");
}
}
}§Quickstart (embed — single-frame request/response)
Embed lives on a third socket separate from v1 and v2 per ADR
0017. Use EmbedClient with dial_embed_* and the embed wire
types (EmbedRequest, EmbedResponse, EmbedTask, …). The call
is a single round-trip — no streaming, since an embedding is a
complete vector.
use inferd_client::{EmbedClient, EmbedRequest, EmbedResponse, EmbedTask};
let mut client = inferd_client::dial_and_wait_ready(
std::time::Duration::from_secs(30),
|| EmbedClient::dial_tcp("127.0.0.1:47323"),
)
.await?;
let resp = client.embed(EmbedRequest {
id: "demo-1".into(),
input: vec!["the quick brown fox".into()],
dimensions: Some(256),
task: Some(EmbedTask::RetrievalDocument),
})
.await?;
match resp {
EmbedResponse::Embeddings { embeddings, dimensions, .. } => {
println!("got {} vectors of dim {dimensions}", embeddings.len());
}
EmbedResponse::Error { code, message, .. } => {
eprintln!("[embed error {code:?}: {message}]");
}
}Structs§
- Admin
Client - Subscriber for the inferd admin socket.
- Admin
Event - One frame off the admin socket. Fields not relevant to the
current
status/phaseare absent (or default) per the spec’s flattened wire shape. - Client
- Inference-socket client.
- Client
V2 - v2 inference-socket client.
- Embed
Client - Embed-socket client.
- Embed
Request - Re-exports of the embed wire types per ADR 0017. Embed lives on
the third inferd socket (separate from v1 and v2); the
proto types are re-exported here so consumers don’t need a separate
inferd-protodep. The embed request envelope sent by clients. - Embed
Resolved - Re-exports of the embed wire types per ADR 0017. Embed lives on
the third inferd socket (separate from v1 and v2); the
proto types are re-exported here so consumers don’t need a separate
inferd-protodep.EmbedRequestwith semantic validation completed. - Embed
Usage - Re-exports of the embed wire types per ADR 0017. Embed lives on
the third inferd socket (separate from v1 and v2); the
proto types are re-exported here so consumers don’t need a separate
inferd-protodep. Token-count usage report carried onembeddingsframes. - Image
Token Budget - Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2. Image-token budget; one ofVALID_IMAGE_TOKEN_BUDGETS. Wraps au32so constructors can enforce the enum at the type level. - Message
- Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2. One conversation turn carried inRequest::messages. - Message
V2 - Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. One message in the v2 conversation history. - Request
- Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2. The inference request envelope sent by clients. - Request
V2 - Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. The v2 request envelope sent by clients. - Resolved
- Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2.Requestwith all defaults applied and validation completed. Backends receive this; they never see the optional-shaped wire form. - Resolved
V2 - Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep.RequestV2with semantic validation completed. - Tool
- Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. One tool definition in the request’s top-leveltools[]table. - Tool
Call Id - Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. Strong type around the string id that pairs anassistant-emittedtool_useblock with the matchingtool_resultblock in the consumer’s follow-up request. Wrapping it lets the daemon ensure the round-trip uses the same id and lets middleware authors avoid passing rawStringfor ids. - Usage
- Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2. Token-count usage report carried ondoneframes. - UsageV2
- Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. Token-count usage report carried on v2doneframes.
Enums§
- Attachment
- Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. One binary attachment in the request’s top-levelattachments[]table. - Client
Error - Errors produced by the inference client.
- Content
Block - Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. One element of aMessageV2::contentarray. - Embed
Error Code - Re-exports of the embed wire types per ADR 0017. Embed lives on
the third inferd socket (separate from v1 and v2); the
proto types are re-exported here so consumers don’t need a separate
inferd-protodep. Embed-specific error-code taxonomy. - Embed
Response - Re-exports of the embed wire types per ADR 0017. Embed lives on
the third inferd socket (separate from v1 and v2); the
proto types are re-exported here so consumers don’t need a separate
inferd-protodep. One frame on the embed response stream. - Embed
Task - Re-exports of the embed wire types per ADR 0017. Embed lives on
the third inferd socket (separate from v1 and v2); the
proto types are re-exported here so consumers don’t need a separate
inferd-protodep. Task-prefix hint for embedding models trained with task-aware prefixes (e.g. EmbeddingGemma). Backends that don’t recognise the task ignore the field; the daemon applies the engine-specific prefix on behalf of the consumer per ADR 0013. - Error
Code - Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2. Machine-readable error classification carried onerrorresponse frames. - Error
Code V2 - Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. v2 error-code taxonomy. Superset of v1’sErrorCode(kept independent so the v1 enum stays frozen per ADR 0008). - Proto
Error - Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2. Errors produced by the proto crate while parsing or validating frames. - Response
- Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2. One frame on the response NDJSON stream. - Response
Block - Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. One streaming-output payload carried inside aframeresponse. - Response
V2 - Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. One frame on the v2 response NDJSON stream. - Role
- Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2. Conversation role attached to each message. - RoleV2
- Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. Conversation role on a v2 message. - Stop
Reason - Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2. Why a generation ended. Carried ondoneframes. - Stop
Reason V2 - Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. Why a v2 generation ended. Carried ondoneframes. - Wait
Error - Errors produced by
dial_and_wait_ready.
Constants§
- MAX_
FRAME_ BYTES - Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2. Hard cap on a single NDJSON frame in bytes (64 MiB). - VALID_
IMAGE_ TOKEN_ BUDGETS - Re-exports from
inferd-protoso consumers don’t need a separateinferd-protodep for the wire types. The proto crate IS the version-pin contract for protocol compatibility —inferd-client 0.2always usesinferd-proto 0.2. The set ofimage_token_budgetvalues accepted by the daemon. Any other value is rejected withErrorCode::InvalidRequest.
Functions§
- default_
admin_ addr - Default admin endpoint path per platform. Mirrors the daemon’s
endpoint::default_admin_addrso clients can reach the spec’d default without hard-coding it. - default_
embed_ addr - Default embed inference endpoint path, mirroring the daemon’s
endpoint::default_embed_addr. Returned as aPathBufon Unix and as a pipe-path string on Windows; callers pick bycfg. - default_
v2_ addr - Default v2 admin / inference endpoint paths, mirroring the
daemon’s
endpoint::default_v2_addr. Returned asPathBufon Unix and as a pipe-path string on Windows; callers pick bycfg. - dial_
and_ wait_ ready - Pattern A passive readiness: retry connect against the inference
transport until success or
timeoutelapses. Successful connect is the ready signal — the daemon’s inference socket only exists when the backend isreadyper THREAT_MODEL F-13. - is_
transient_ dial_ error - Returns
trueiferris the kind of transient connect failure that the daemon’s F-13 ready-gating produces during bring-up (the inference socket doesn’t exist yet). Permanent errors (permission denied, malformed addr) returnfalseand bubble up immediately rather than spamming retries.
Type Aliases§
- Frame
Stream - Stream of
Responseframes yielded byClient::generate. - Frame
Stream V2 - Stream of
ResponseV2frames yielded byClientV2::generate. - Tool
UseInput - Re-exports of the v2 wire types per ADR 0015. v2 is shipped as
part of
inferd-client 0.2so consumers building against v2 can reach the proto types without a separateinferd-protodep. Free-form JSON object representing a tool’s invocation arguments.