pub struct Cli {Show 31 fields
pub backend: BackendKind,
pub lock: PathBuf,
pub tcp: Option<String>,
pub uds: Option<PathBuf>,
pub pipe: Option<String>,
pub group: Option<String>,
pub active_permits: usize,
pub queue_depth: usize,
pub ready_timeout_secs: u64,
pub model_path: Option<PathBuf>,
pub model_sha256: Option<String>,
pub n_ctx: u32,
pub n_gpu_layers: i32,
pub openai_base_url: Option<String>,
pub openai_api_key: Option<String>,
pub openai_model: Option<String>,
pub openai_timeout_secs: u64,
pub bedrock_region: Option<String>,
pub bedrock_model_id: Option<String>,
pub bedrock_bearer_token: Option<String>,
pub bedrock_endpoint: Option<String>,
pub bedrock_timeout_secs: u64,
pub api_key: Option<String>,
pub config: Option<PathBuf>,
pub admin_addr: Option<PathBuf>,
pub v2: bool,
pub v2_addr: Option<PathBuf>,
pub v2_tcp: Option<String>,
pub embed: bool,
pub embed_addr: Option<PathBuf>,
pub embed_tcp: Option<String>,
}Expand description
Top-level CLI for inferd-daemon.
Fields§
§backend: BackendKindBackend to load at startup.
lock: PathBufPath to the single-instance lock file. The lock is held for the lifetime of the daemon process.
tcp: Option<String>Loopback TCP bind address. Mutually exclusive with --uds and --pipe.
uds: Option<PathBuf>Unix domain socket path. Mutually exclusive with --tcp and --pipe. Unix only.
pipe: Option<String>Windows named pipe path (e.g. \\.\pipe\inferd-infer).
Mutually exclusive with --tcp and --uds. Windows only.
group: Option<String>Group name for the UDS (Unix only). Ignored on other transports.
active_permits: usizeActive generations served concurrently. v0.1 invariant is 1; values above 1 are reserved for v0.2 continuous-batching backends.
queue_depth: usizeMaximum waiting queue depth. Submits beyond this return
code: queue_full immediately.
ready_timeout_secs: u64Seconds to wait for the backend to report ready before failing startup.
model_path: Option<PathBuf>Path to the GGUF model file. Required when --backend llamacpp.
model_sha256: Option<String>Optional expected SHA-256 of the model file as a hex string
(64 chars). When present, the daemon verifies the file before
loading via subtle::ConstantTimeEq (THREAT_MODEL F-5).
n_ctx: u32Llama.cpp context window in tokens. Default 8192.
n_gpu_layers: i32Llama.cpp GPU layer offload count. 0 = CPU-only. GPU support
requires the cuda/metal/vulkan/rocm cargo feature at
build time.
openai_base_url: Option<String>Base URL of the upstream OpenAI-compat endpoint, no trailing
slash and no path (the adapter appends /v1/chat/completions).
Required when --backend openai-compat. Examples:
https://api.openai.com, http://localhost:11434,
https://openrouter.ai.
openai_api_key: Option<String>Bearer token for the OpenAI-compat upstream. Sent as
Authorization: Bearer <value>. Pass an empty string to skip
the header entirely for self-hosted endpoints. Resolves from
--openai-api-key, then INFERD_OPENAI_API_KEY, then
OPENAI_API_KEY (the de-facto env name most providers’ SDKs
already use).
openai_model: Option<String>Upstream model identifier echoed in the request model field
— provider-specific (e.g. gpt-4o-mini, llama3.1:8b,
meta-llama/Meta-Llama-3-70B-Instruct). Required when
--backend openai-compat.
openai_timeout_secs: u64Total request timeout for OpenAI-compat calls, in seconds. Default 300 (5 minutes) — long enough for a slow first-token from a cold cloud model, short enough to surface stuck requests rather than hang forever.
bedrock_region: Option<String>AWS region the Bedrock endpoint lives in, e.g. us-east-1,
eu-central-1. Required when --backend bedrock-invoke.
Used for both the endpoint host and SigV4 signing scope.
bedrock_model_id: Option<String>Bedrock model id (URL-encoded by the adapter), e.g.
anthropic.claude-3-5-sonnet-20241022-v2:0. Required when
--backend bedrock-invoke.
bedrock_bearer_token: Option<String>Pre-issued Bedrock bearer token (AWS_BEARER_TOKEN_BEDROCK
shape, AWS rolled this out in 2025-06). When set, the adapter
sends Authorization: Bearer <value> and skips SigV4. When
unset, the adapter falls back to the standard
AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY (+ optional
AWS_SESSION_TOKEN) chain via SigV4 signing.
bedrock_endpoint: Option<String>Override the Bedrock endpoint host. Empty/absent → default
bedrock-runtime.<region>.amazonaws.com. Useful for VPC
endpoints / integration tests.
bedrock_timeout_secs: u64Total request timeout for Bedrock calls, in seconds. Default 300 (5 minutes).
api_key: Option<String>Optional pre-shared API key. When set, TCP clients MUST send
{"type":"auth","key":"<this value>"} as their first NDJSON
frame on the connection or the daemon closes the connection.
UDS and named-pipe transports ignore this — kernel-attested
peer credentials (F-7) do the work there.
Comparison is constant-time. THREAT_MODEL F-8.
config: Option<PathBuf>Path to the operator JSON config file. Default
~/.inferd/config.json. When present, fetch + auto-pull are
driven from it; CLI flags (--model-path, --model-sha256,
--n-ctx, --n-gpu-layers) override config-file values when
both are supplied. When absent, the daemon falls back to
CLI-flag-only operation (dev mode).
admin_addr: Option<PathBuf>Admin endpoint path. Defaults per-platform to the path
documented in docs/protocol-v1.md §“Admin endpoint” — e.g.
/run/inferd/admin.sock on Linux, \\.\pipe\inferd-admin on
Windows. Override for tests / non-default deployments.
v2: boolEnable the v2 inference endpoint per ADR 0015. v2 binds on a
separate socket from v1: infer.v2.sock on Unix /
\\.\pipe\inferd-infer-v2 on Windows. v1 stays on its own
socket and is unaffected.
Phase 1B: the v2 endpoint accepts and validates v2 requests
but returns Error{code:internal, message:"v2 generation not implemented"} because the Backend trait does not yet expose
generate_v2. Use this to integration-test middleware that
will speak v2 once Phase 2A lands.
v2_addr: Option<PathBuf>Override the default v2 inference endpoint path.
Mirrors --uds / --pipe for v2; on Linux/macOS this is a
UDS path, on Windows a named-pipe path. Has no effect unless
--v2 is also set.
v2_tcp: Option<String>Loopback TCP bind address for the v2 endpoint. Mutually
exclusive with --v2-addr. Useful for tests that don’t want
the platform default (UDS / named pipe). Has no effect
unless --v2 is also set.
embed: boolEnable the embed inference endpoint per ADR 0017. The embed
endpoint binds on a separate socket from v1/v2:
infer.embed.sock on Unix / \\.\pipe\inferd-infer-embed
on Windows. Has no effect unless the active backend’s
capabilities().embed is true (capability-driven binding).
embed_addr: Option<PathBuf>Override the default embed inference endpoint path.
Mirrors --uds / --pipe for embed; on Linux/macOS this is
a UDS path, on Windows a named-pipe path. Has no effect
unless --embed is also set.
embed_tcp: Option<String>Loopback TCP bind address for the embed endpoint. Mutually
exclusive with --embed-addr. Has no effect unless --embed
is also set.
Implementations§
Trait Implementations§
Source§impl Args for Cli
impl Args for Cli
Source§fn augment_args<'b>(__clap_app: Command) -> Command
fn augment_args<'b>(__clap_app: Command) -> Command
Source§fn augment_args_for_update<'b>(__clap_app: Command) -> Command
fn augment_args_for_update<'b>(__clap_app: Command) -> Command
Command so it can instantiate self via
FromArgMatches::update_from_arg_matches_mut Read moreSource§impl CommandFactory for Cli
impl CommandFactory for Cli
Source§impl FromArgMatches for Cli
impl FromArgMatches for Cli
Source§fn from_arg_matches(__clap_arg_matches: &ArgMatches) -> Result<Self, Error>
fn from_arg_matches(__clap_arg_matches: &ArgMatches) -> Result<Self, Error>
Source§fn from_arg_matches_mut(
__clap_arg_matches: &mut ArgMatches,
) -> Result<Self, Error>
fn from_arg_matches_mut( __clap_arg_matches: &mut ArgMatches, ) -> Result<Self, Error>
Source§fn update_from_arg_matches(
&mut self,
__clap_arg_matches: &ArgMatches,
) -> Result<(), Error>
fn update_from_arg_matches( &mut self, __clap_arg_matches: &ArgMatches, ) -> Result<(), Error>
ArgMatches to self.Source§fn update_from_arg_matches_mut(
&mut self,
__clap_arg_matches: &mut ArgMatches,
) -> Result<(), Error>
fn update_from_arg_matches_mut( &mut self, __clap_arg_matches: &mut ArgMatches, ) -> Result<(), Error>
ArgMatches to self.