localharness 0.5.0

A Rust-native agent SDK for Gemini. Streaming, custom tools, safety policies, background triggers — zero external binaries.
Documentation

localharness

A Rust-native agent SDK for Gemini. Build production agents with streaming text, custom tools, safety policies, and background triggers — all from a single cargo add. Zero external binaries.

crates.io docs.rs license CI

use localharness::{Agent, GeminiAgentConfig};

#[tokio::main]
async fn main() -> localharness::Result<()> {
    let agent = Agent::start_gemini(
        GeminiAgentConfig::new(std::env::var("GEMINI_API_KEY").unwrap())
            .with_system_instructions("You are a concise code reviewer."),
    ).await?;

    let response = agent.chat("Review: fn add(a: i32, b: i32) -> i32 { a - b }").await?;
    println!("{}", response.text().await?);

    agent.shutdown().await?;
    Ok(())
}

Status: 0.5.x · stable Rust-native runtime · 11/11 built-in tools · MCP bridge · context-window compaction · wasm32 target + browser demo.

Try it in your browser: antig-compusophys-projects.vercel.app — the same Agent loop, compiled to wasm and driven from a single index.html.


Contents


Install

[dependencies]
localharness = "0.5"
tokio        = { version = "1", features = ["macros", "rt-multi-thread"] }
export GEMINI_API_KEY="your_api_key_here"

No Python install, no Go binary, no harness process — cargo build and you have an agent. Get an API key from Google AI Studio.


Concepts

Layer Type Use when
1 Agent One-shot or short-running scripts. Batteries included.
2 Conversation / ChatResponse Long-lived sessions, history introspection, custom turn shapes.
3 Connection Embed the SDK in your own runtime, swap the transport.

Examples

use futures_util::StreamExt;

let response = agent.chat("Write a haiku about Rust.").await?;
let mut tokens = response.text_stream();
while let Some(chunk) = tokens.next().await {
    print!("{}", chunk?);
}

Every cursor (text_stream, thoughts, tool_calls) replays from chunk zero and advances independently — safe to consume concurrently from multiple tasks.

use futures_util::StreamExt;

let response = agent.chat("What time is it in Tokyo?").await?;

let thoughts = async {
    let mut t = response.thoughts();
    while let Some(text) = t.next().await { eprint!("{}", text?); }
    Ok::<_, localharness::Error>(())
};
let calls = async {
    let mut c = response.tool_calls();
    while let Some(call) = c.next().await { println!("{}", call?.name); }
    Ok::<_, localharness::Error>(())
};

let (a, b) = tokio::join!(thoughts, calls);
a?; b?;
use localharness::{allow_all, Agent, ClosureTool, GeminiAgentConfig};
use serde_json::json;

let weather = ClosureTool::new(
    "get_weather",
    "Return the weather for a city.",
    json!({ "type": "object", "properties": { "city": { "type": "string" } } }),
    |args, _ctx| async move {
        let city = args["city"].as_str().unwrap_or("?");
        Ok(json!({ "weather": format!("sunny in {city}") }))
    },
);

let agent = Agent::start_gemini(
    GeminiAgentConfig::new(api_key)
        .with_tool(weather)
        .with_policies(vec![allow_all()]),
).await?;
use localharness::{Agent, CapabilitiesConfig, GeminiAgentConfig};

let agent = Agent::start_gemini(
    GeminiAgentConfig::new(api_key)
        .with_capabilities(CapabilitiesConfig::unrestricted())
        .with_workspace("/home/me/project"),
).await?;

let response = agent.chat("List the Rust files under src/ and show the first 50 lines of lib.rs.").await?;
println!("{}", response.text().await?);

workspace_only(...) policies are auto-installed when with_workspace is set; every file tool's path is canonicalized and rejected if it escapes the workspace.

use localharness::{deny_all, Policy};
use std::sync::Arc;

let policies = vec![
    deny_all(),                                // start from nothing
    Policy::allow("view_file"),                // safe reads ok
    Policy::ask("run_command", Arc::new(|call| {
        // Pop your own UI. Return true to approve.
        eprintln!("approve `{}`? {:?}", call.name, call.args);
        true
    })),
];

Precedence: specific deny ≻ specific ask ≻ specific allow ≻ wildcard deny ≻ wildcard ask ≻ wildcard allow. Matches the Python SDK rule.

let schema = serde_json::json!({
    "type": "object",
    "properties": {
        "summary":  { "type": "string" },
        "severity": { "type": "string", "enum": ["low", "medium", "high"] }
    },
    "required": ["summary", "severity"]
});

let agent = Agent::start_gemini(
    GeminiAgentConfig::new(api_key)
        .with_response_schema(schema.to_string()),
).await?;

let response = agent.chat("Triage this bug report: ...").await?;
let _ = response.text().await?; // drain
let out = agent.conversation().last_structured_output().unwrap();
println!("{out}");

The model calls the built-in finish(output) tool when it's done; the agent extracts output into last_structured_output().

use std::time::Duration;
use localharness::every;

let watchdog = every(Duration::from_secs(60), "deploy_watch", |ctx| async move {
    ctx.send_when_idle("Check the deployment status.").await
});

let agent = Agent::start_gemini(
    GeminiAgentConfig::new(api_key).with_trigger(watchdog),
).await?;
use localharness::{Content, Media, Part};

let chart = Media::from_path("./diagram.png")?
    .with_description("system architecture diagram");

let spec = Media::from_path("./spec.pdf")?;

let prompt: Content = vec![
    Part::from("List three vulnerabilities, citing the diagram and spec."),
    Part::from(chart),
    Part::from(spec),
].into();

let response = agent.chat(prompt).await?;

Media is stored as Bytes — cloning into multiple stream frames is refcounted, so a 30 MB PDF is never copied.

Connect to an external Model Context Protocol server and expose its tools to the agent. The bridge spawns the server, fetches its tool catalog, and registers each as a regular Tool — the model can't tell the difference between an in-process tool and an MCP-served one.

use localharness::{Agent, GeminiAgentConfig};
use localharness::types::McpServerConfig;

let agent = Agent::start_gemini(
    GeminiAgentConfig::new(api_key)
        .with_mcp_server(McpServerConfig::Stdio {
            command: "uvx".into(),
            args: vec!["mcp-server-fetch".into()],
        }),
).await?;

Today: stdio transport only; SSE/HTTP variants return an error. Tools surface only — prompts, resources, sampling, and subscriptions are out of scope.

When prompt tokens for a turn exceed CapabilitiesConfig::compaction_threshold, the agent summarizes the oldest history entries via a separate Gemini call and replaces them with one synthetic user-role turn tagged [compacted prior context]. The most-recent 6 user/model pairs are kept verbatim; function-call / response pairs are kept together.

use localharness::{Agent, CapabilitiesConfig, GeminiAgentConfig};

let mut caps = CapabilitiesConfig::unrestricted();
caps.compaction_threshold = Some(60_000);

let agent = Agent::start_gemini(
    GeminiAgentConfig::new(api_key).with_capabilities(caps),
).await?;

Disabled by default (set to None). Typical values: 60-80% of your model's max context window. Summarization failures fall back to a drop-oldest strategy with the same tag.

let agent = Agent::start_gemini(
    GeminiAgentConfig::new(api_key).resume("conv-abc123"),
).await?;

Built-in tools

The Gemini backend ships 10 of 11 tools enabled by BuiltinTool, auto-registered into the ToolRunner per CapabilitiesConfig. The default CapabilitiesConfig exposes the read-only safety subset; call CapabilitiesConfig::unrestricted() to enable everything.

Tool Read/Write Description
list_directory R Sorted children with name, kind, size.
view_file R UTF-8 lossy read with optional 1-indexed line range; 256 KiB cap.
find_file R Glob-matched recursive name search; 1000-match cap.
search_directory R Regex content search with optional file glob; 500-match cap.
finish term Terminate turn + capture structured output.
create_file W Atomic write via tempfile + rename; refuses to overwrite.
edit_file W Exact-once substring replace (or replace_all); atomic write.
run_command W Shell exec with timeout (default 30s / max 600s), 256 KiB output cap.
generate_image W Call the image model; returns base64 + MIME.
ask_question I/O Default no-op (returns skipped: true); register a custom ask_question tool for interactive UI.
start_subagent spawn One-shot text-only subagent with isolated context. Returns { final_response, finish_reason }.

Custom tools registered with the same name as a built-in win — overrides are intentional.


Architecture

   ┌──────────────────────────────────────────────────────┐
   │  L1   Agent           start · chat · shutdown        │
   ├──────────────────────────────────────────────────────┤
   │  L2   Conversation    history · usage · streams      │
   │       ChatResponse    text · thoughts · tool_calls   │
   ├──────────────────────────────────────────────────────┤
   │  L3   Connection      transport abstraction          │
   │       GeminiConnection  reqwest + SSE + tool loop    │
   └──────────────────────────────────────────────────────┘
                              │
                              │  HTTPS (rustls)
                              ▼
                          Gemini API

Inside the Gemini agent loop:

   user prompt ───►│                                         ▲
                   │ build GenerateContentRequest            │  emit Step
                   │ ───────► Gemini SSE ──────────► chunks  │  (text,
                   │             │                           │   thought,
                   │             ▼                           │   tool_call)
                   │   functionCall parts?  ────► dispatch ──┘
                   │             │              hooks→policy→tool_runner
                   │             ▼
                   │   append functionResponse ──► loop ─────┐
                   │                                         │
                   │   no more calls / finish ──► terminal Step

A single broadcast channel fans Steps out to every cursor (ChatResponse::chunks, text_stream, thoughts, tool_calls). The tool dispatch loop is inline inside the turn — no out-of-band round-trip through a sidecar process.


Run in the browser

localharness compiles to wasm32-unknown-unknown. The same Agent loop that drives a CLI runs inside a browser tab — no server, no backend, key stays in the page.

Live demo: antig-compusophys-projects.vercel.app

Drop the default native feature when targeting wasm:

[target.'cfg(target_arch = "wasm32")'.dependencies]
localharness = { version = "0.5", default-features = false }

Run the demo locally:

git clone https://github.com/compusophy/localharness
cd localharness
./scripts/build-web.sh        # wasm-pack build → web/pkg/
python -m http.server 8765 -d web

What works on wasm: the full AgentConversationConnectionToolRunner chain, plus the 4 portable built-in tools (ask_question, finish, generate_image, start_subagent). The Gemini SSE stream parses the same way; everything streams through the same code path the native CLI uses.

What doesn't (yet): the 6 filesystem builtins (list_directory, view_file, find_file, search_directory, create_file, edit_file), run_command, and the MCP stdio bridge all need OS primitives the browser sandbox doesn't expose. They stay gated behind the native cargo feature. An OPFS-backed Filesystem trait lands in a later release.

The localharness-web/ cdylib crate in this repo is the reference wasm-bindgen wrapper; it stores one Agent per tab in a thread_local and exposes start_session, chat, and reset_session to JavaScript.


Design notes (performance & safety)

  • Lock-free idle polling. Connection::is_idle() reads an AtomicBool. Trigger handlers can hot-loop without contention.
  • Broadcast fan-out for steps. Cursors subscribe without blocking the producer; replay buffer is bounded; slow consumers fail fast.
  • Bounded backpressure everywhere. Step broadcast cap 256. Function-call dispatch capped at 16 rounds per turn (MAX_TOOL_ROUNDS).
  • Atomic file writes. create_file and edit_file write through a tempfile::NamedTempFile in the same directory and rename into place — a crash mid-write never leaves a partially written file.
  • Bounded subprocess output. run_command caps each stream at 256 KiB and kills the child on timeout with kill_on_drop.
  • Component-wise path containment. workspace_only() defeats prefix tricks (/foo/bar-evil vs /foo/bar).
  • Lock-free tool-context swap. arc_swap::ArcSwapOption replaces the runtime context atomically across concurrent tool calls.
  • Typed errors. Flat thiserror enum; io::Error, serde_json::Error, reqwest::Error fold via #[from].
  • API key redaction. Debug for GeminiClient prints <redacted> for the key.
  • Zero-copy media. Media::data is bytes::Bytes. Cloning a part into multiple frames is a refcount bump.

FAQ

Does this need a server? No. The crate uses reqwest to call the Gemini REST API directly. No localhost daemon, no Go binary, no Python.

How do I get a GEMINI_API_KEY? From Google AI Studio. Free tier is sufficient for development.

Which model does it use? Default gemini-3.5-flash for chat, gemini-3.1-flash-image-preview for generate_image. Override with GeminiBackendConfig::with_model(...).

Why does write-tool access require a policy? Enabling tools that write to disk or run commands without a policy is almost always a bug. Add with_policies(vec![allow_all()]) to opt in, or with_workspace(...) to scope.

MSRV? Rust 1.85 (edition 2024).

Async runtime? Tokio.

How do I get tracing logs?

tracing_subscriber::fmt().with_env_filter("localharness=debug").init();

What about the 0.1.x start_local / Go binary? Still works in 0.2.x but marked #[deprecated]; removed in 0.3.0. Migrate to start_gemini. See UPSTREAM.md and DESIGN.md for the historical context.


License

Apache-2.0.