Crate hoosh

Expand description

Hoosh — AI inference gateway for Rust.

Multi-provider LLM routing, local model serving, speech-to-text, and token budget management. OpenAI-compatible HTTP API.

Name: Hoosh (Persian: هوش) — intelligence, the word for AI.

§Architecture

Clients (tarang, daimon, agnoshi, consumer apps)
    │
    ▼
Router (provider selection, load balancing, fallback)
    │
    ├──▶ Local backends (Ollama, llama.cpp, Synapse, whisper.cpp)
    │
    └──▶ Remote APIs (OpenAI, Anthropic, DeepSeek, Mistral, Groq, ...)
          │
          ▼
    Cache ◀── Rate Limiter ◀── Token Budget

§Quick start

use hoosh::{InferenceRequest, HooshClient};

let client = HooshClient::new("http://localhost:8088");
let response = client.infer(&InferenceRequest {
    model: "llama3".into(),
    prompt: "Explain Rust ownership in one sentence.".into(),
    ..Default::default()
}).await?;
println!("{}", response.text);

Re-exports§

pub use budget::TokenBudget;
pub use budget::TokenPool;
pub use cache::ResponseCache;
pub use client::HooshClient;
pub use error::HooshError;
pub use inference::InferenceRequest;
pub use inference::InferenceResponse;
pub use inference::ModelInfo;
pub use provider::LlmProvider;
pub use provider::ProviderRegistry;
pub use provider::ProviderType;
pub use router::Router;
pub use tools::ToolCall;
pub use tools::ToolChoice;
pub use tools::ToolDefinition;
pub use tools::ToolResult;

Modules§

audit: Cryptographic audit log — HMAC-SHA256 linked chain for tamper-proof request/response logging.
budget: Token budget management: per-agent and per-pool token accounting.
cache: Response caching for inference results.
client: HTTP client for talking to a hoosh server.
config: Configuration file loading for hoosh.
context: Context management — token counting, context compaction, prompt compression.
cost: Per-provider cost tracking — pricing table and cost accumulation.
error: Error types for hoosh.
events: Provider event bus — pub/sub for internal provider events.
hardware: Hardware-aware model placement using ai-hwaccel.
health: Background health checker — periodic provider health monitoring with automatic failover.
inference: Core inference types: requests, responses, model metadata.
metrics: Prometheus metrics for the hoosh inference gateway.
middleware
provider: LLM provider trait and type registry.
queue: Priority request queue — queue inference requests when providers are busy.
router: Request routing: provider selection, load balancing, fallback.
server: Axum HTTP server exposing the OpenAI-compatible API.
tools: Tool use & function calling — unified abstraction across LLM providers, with optional MCP integration via bote + szal.

Crate hoosh

Crate hoosh Copy item path

§Architecture

§Quick start

Re-exports§

Modules§

Crate hoosh