Skip to main content

Crate hoosh

Crate hoosh 

Source
Expand description

Hoosh — AI inference gateway for Rust.

Multi-provider LLM routing, local model serving, speech-to-text, and token budget management. OpenAI-compatible HTTP API.

Name: Hoosh (Persian: هوش) — intelligence, the word for AI.

§Architecture

Clients (tarang, daimon, agnoshi, consumer apps)
    │
    ▼
Router (provider selection, load balancing, fallback)
    │
    ├──▶ Local backends (Ollama, llama.cpp, Synapse, whisper.cpp)
    │
    └──▶ Remote APIs (OpenAI, Anthropic, DeepSeek, Mistral, Groq, ...)
          │
          ▼
    Cache ◀── Rate Limiter ◀── Token Budget

§Quick start

use hoosh::{InferenceRequest, HooshClient};

let client = HooshClient::new("http://localhost:8088");
let response = client.infer(&InferenceRequest {
    model: "llama3".into(),
    prompt: "Explain Rust ownership in one sentence.".into(),
    ..Default::default()
}).await?;
println!("{}", response.text);

Re-exports§

pub use budget::TokenBudget;
pub use budget::TokenPool;
pub use cache::ResponseCache;
pub use client::HooshClient;
pub use error::HooshError;
pub use inference::InferenceRequest;
pub use inference::InferenceResponse;
pub use inference::ModelInfo;
pub use provider::LlmProvider;
pub use provider::ProviderRegistry;
pub use provider::ProviderType;
pub use router::Router;
pub use tools::ToolCall;
pub use tools::ToolChoice;
pub use tools::ToolDefinition;
pub use tools::ToolResult;

Modules§

audit
Cryptographic audit log — HMAC-SHA256 linked chain for tamper-proof request/response logging.
budget
Token budget management: per-agent and per-pool token accounting.
cache
Response caching for inference results.
client
HTTP client for talking to a hoosh server.
config
Configuration file loading for hoosh.
context
Context management — token counting, context compaction, prompt compression.
cost
Per-provider cost tracking — pricing table and cost accumulation.
error
Error types for hoosh.
events
Provider event bus — pub/sub for internal provider events.
hardware
Hardware-aware model placement using ai-hwaccel.
health
Background health checker — periodic provider health monitoring with automatic failover.
inference
Core inference types: requests, responses, model metadata.
metrics
Prometheus metrics for the hoosh inference gateway.
middleware
provider
LLM provider trait and type registry.
queue
Priority request queue — queue inference requests when providers are busy.
router
Request routing: provider selection, load balancing, fallback.
server
Axum HTTP server exposing the OpenAI-compatible API.
tools
Tool use & function calling — unified abstraction across LLM providers, with optional MCP integration via bote + szal.