Skip to main content

Crate inferd_client

Crate inferd_client 

Source
Expand description

Rust client for the inferd local-inference daemon.

Wire protocol is NDJSON over Unix socket / Windows named pipe / loopback TCP. Spec is frozen as protocol v1; see the inferd repository’s docs/protocol-v1.md.

Two patterns for waiting on the daemon to come up; pick based on whether you need progress UX:

  • Pattern A (passive)dial_and_wait_ready retries connect against the inference transport with exponential backoff. Successful connect is the ready signal because the daemon’s inference socket only exists when the backend is ready (THREAT_MODEL F-13 in the upstream repo). Standard Postgres/Redis/etcd client shape.
  • Pattern B (active)AdminClient subscribes to the admin socket and yields lifecycle events (starting/loading_model/ready/restarting/draining). Use this for installer GUIs, dashboards, or middleware that wants to display download progress during first-boot bootstrap.

§Quickstart (v1)

use inferd_client::{Client, Request, Message, Role, Response};
use tokio_stream::StreamExt;

let mut client = inferd_client::dial_and_wait_ready(
    std::time::Duration::from_secs(30),
    || Client::dial_tcp("127.0.0.1:47321"),
)
.await?;

let mut stream = client.generate(Request {
    id: "demo-1".into(),
    messages: vec![Message {
        role: Role::User,
        content: "hello".into(),
    }],
    ..Default::default()
})
.await?;

while let Some(frame) = stream.next().await {
    match frame? {
        Response::Token { content, .. } => print!("{content}"),
        Response::Done { stop_reason, backend, .. } => {
            println!("\n[done; backend={backend}, stop={stop_reason:?}]");
        }
        Response::Error { code, message, .. } => {
            eprintln!("[error {code:?}: {message}]");
        }
        Response::Status { .. } => {}
    }
}

§Quickstart (v2 — typed content blocks, attachments, tools)

v2 lives on a separate socket from v1 per ADR 0015. Use ClientV2 with dial_v2_* instead of dial_tcp/dial_uds and the v2 wire types (RequestV2, ContentBlock, …).

use inferd_client::{ClientV2, RequestV2, MessageV2, RoleV2, ContentBlock, ResponseV2, ResponseBlock};
use tokio_stream::StreamExt;

let mut client = inferd_client::dial_and_wait_ready(
    std::time::Duration::from_secs(30),
    || ClientV2::dial_tcp("127.0.0.1:47322"),
)
.await?;

let mut stream = client.generate(RequestV2 {
    id: "demo-1".into(),
    messages: vec![MessageV2 {
        role: RoleV2::User,
        content: vec![ContentBlock::Text { text: "hello".into() }],
    }],
    ..Default::default()
})
.await?;

while let Some(frame) = stream.next().await {
    match frame? {
        ResponseV2::Frame { block: ResponseBlock::Text { delta }, .. } => print!("{delta}"),
        ResponseV2::Frame { block: ResponseBlock::Thinking { .. }, .. } => {}
        ResponseV2::Frame { block: ResponseBlock::ToolUse { name, .. }, .. } => {
            println!("\n[tool_use: {name}]");
        }
        ResponseV2::Done { stop_reason, backend, .. } => {
            println!("\n[done; backend={backend}, stop={stop_reason:?}]");
        }
        ResponseV2::Error { code, message, .. } => {
            eprintln!("[error {code:?}: {message}]");
        }
    }
}

§Quickstart (embed — single-frame request/response)

Embed lives on a third socket separate from v1 and v2 per ADR 0017. Use EmbedClient with dial_embed_* and the embed wire types (EmbedRequest, EmbedResponse, EmbedTask, …). The call is a single round-trip — no streaming, since an embedding is a complete vector.

use inferd_client::{EmbedClient, EmbedRequest, EmbedResponse, EmbedTask};

let mut client = inferd_client::dial_and_wait_ready(
    std::time::Duration::from_secs(30),
    || EmbedClient::dial_tcp("127.0.0.1:47323"),
)
.await?;

let resp = client.embed(EmbedRequest {
    id: "demo-1".into(),
    input: vec!["the quick brown fox".into()],
    dimensions: Some(256),
    task: Some(EmbedTask::RetrievalDocument),
})
.await?;

match resp {
    EmbedResponse::Embeddings { embeddings, dimensions, .. } => {
        println!("got {} vectors of dim {dimensions}", embeddings.len());
    }
    EmbedResponse::Error { code, message, .. } => {
        eprintln!("[embed error {code:?}: {message}]");
    }
}

Structs§

AdminClient
Subscriber for the inferd admin socket.
AdminEvent
One frame off the admin socket. Fields not relevant to the current status/phase are absent (or default) per the spec’s flattened wire shape.
Client
Inference-socket client.
ClientV2
v2 inference-socket client.
EmbedClient
Embed-socket client.
EmbedRequest
Re-exports of the embed wire types per ADR 0017. Embed lives on the third inferd socket (separate from v1 and v2); the proto types are re-exported here so consumers don’t need a separate inferd-proto dep. The embed request envelope sent by clients.
EmbedResolved
Re-exports of the embed wire types per ADR 0017. Embed lives on the third inferd socket (separate from v1 and v2); the proto types are re-exported here so consumers don’t need a separate inferd-proto dep. EmbedRequest with semantic validation completed.
EmbedUsage
Re-exports of the embed wire types per ADR 0017. Embed lives on the third inferd socket (separate from v1 and v2); the proto types are re-exported here so consumers don’t need a separate inferd-proto dep. Token-count usage report carried on embeddings frames.
ImageTokenBudget
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. Image-token budget; one of VALID_IMAGE_TOKEN_BUDGETS. Wraps a u32 so constructors can enforce the enum at the type level.
Message
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. One conversation turn carried in Request::messages.
MessageV2
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. One message in the v2 conversation history.
Request
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. The inference request envelope sent by clients.
RequestV2
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. The v2 request envelope sent by clients.
Resolved
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. Request with all defaults applied and validation completed. Backends receive this; they never see the optional-shaped wire form.
ResolvedV2
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. RequestV2 with semantic validation completed.
Tool
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. One tool definition in the request’s top-level tools[] table.
ToolCallId
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. Strong type around the string id that pairs an assistant-emitted tool_use block with the matching tool_result block in the consumer’s follow-up request. Wrapping it lets the daemon ensure the round-trip uses the same id and lets middleware authors avoid passing raw String for ids.
Usage
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. Token-count usage report carried on done frames.
UsageV2
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. Token-count usage report carried on v2 done frames.

Enums§

Attachment
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. One binary attachment in the request’s top-level attachments[] table.
ClientError
Errors produced by the inference client.
ContentBlock
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. One element of a MessageV2::content array.
EmbedErrorCode
Re-exports of the embed wire types per ADR 0017. Embed lives on the third inferd socket (separate from v1 and v2); the proto types are re-exported here so consumers don’t need a separate inferd-proto dep. Embed-specific error-code taxonomy.
EmbedResponse
Re-exports of the embed wire types per ADR 0017. Embed lives on the third inferd socket (separate from v1 and v2); the proto types are re-exported here so consumers don’t need a separate inferd-proto dep. One frame on the embed response stream.
EmbedTask
Re-exports of the embed wire types per ADR 0017. Embed lives on the third inferd socket (separate from v1 and v2); the proto types are re-exported here so consumers don’t need a separate inferd-proto dep. Task-prefix hint for embedding models trained with task-aware prefixes (e.g. EmbeddingGemma). Backends that don’t recognise the task ignore the field; the daemon applies the engine-specific prefix on behalf of the consumer per ADR 0013.
ErrorCode
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. Machine-readable error classification carried on error response frames.
ErrorCodeV2
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. v2 error-code taxonomy. Superset of v1’s ErrorCode (kept independent so the v1 enum stays frozen per ADR 0008).
ProtoError
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. Errors produced by the proto crate while parsing or validating frames.
Response
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. One frame on the response NDJSON stream.
ResponseBlock
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. One streaming-output payload carried inside a frame response.
ResponseV2
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. One frame on the v2 response NDJSON stream.
Role
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. Conversation role attached to each message.
RoleV2
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. Conversation role on a v2 message.
StopReason
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. Why a generation ended. Carried on done frames.
StopReasonV2
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. Why a v2 generation ended. Carried on done frames.
WaitError
Errors produced by dial_and_wait_ready.

Constants§

MAX_FRAME_BYTES
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. Hard cap on a single NDJSON frame in bytes (64 MiB).
VALID_IMAGE_TOKEN_BUDGETS
Re-exports from inferd-proto so consumers don’t need a separate inferd-proto dep for the wire types. The proto crate IS the version-pin contract for protocol compatibility — inferd-client 0.2 always uses inferd-proto 0.2. The set of image_token_budget values accepted by the daemon. Any other value is rejected with ErrorCode::InvalidRequest.

Functions§

default_admin_addr
Default admin endpoint path per platform. Mirrors the daemon’s endpoint::default_admin_addr so clients can reach the spec’d default without hard-coding it.
default_embed_addr
Default embed inference endpoint path, mirroring the daemon’s endpoint::default_embed_addr. Returned as a PathBuf on Unix and as a pipe-path string on Windows; callers pick by cfg.
default_v2_addr
Default v2 admin / inference endpoint paths, mirroring the daemon’s endpoint::default_v2_addr. Returned as PathBuf on Unix and as a pipe-path string on Windows; callers pick by cfg.
dial_and_wait_ready
Pattern A passive readiness: retry connect against the inference transport until success or timeout elapses. Successful connect is the ready signal — the daemon’s inference socket only exists when the backend is ready per THREAT_MODEL F-13.
is_transient_dial_error
Returns true if err is the kind of transient connect failure that the daemon’s F-13 ready-gating produces during bring-up (the inference socket doesn’t exist yet). Permanent errors (permission denied, malformed addr) return false and bubble up immediately rather than spamming retries.

Type Aliases§

FrameStream
Stream of Response frames yielded by Client::generate.
FrameStreamV2
Stream of ResponseV2 frames yielded by ClientV2::generate.
ToolUseInput
Re-exports of the v2 wire types per ADR 0015. v2 is shipped as part of inferd-client 0.2 so consumers building against v2 can reach the proto types without a separate inferd-proto dep. Free-form JSON object representing a tool’s invocation arguments.