Expand description
Hoosh — AI inference gateway for Rust.
Multi-provider LLM routing, local model serving, speech-to-text, and token budget management. OpenAI-compatible HTTP API.
Name: Hoosh (Persian: هوش) — intelligence, the word for AI.
§Architecture
Clients (tarang, daimon, agnoshi, consumer apps)
│
▼
Router (provider selection, load balancing, fallback)
│
├──▶ Local backends (Ollama, llama.cpp, Synapse, whisper.cpp)
│
└──▶ Remote APIs (OpenAI, Anthropic, DeepSeek, Mistral, Groq, ...)
│
▼
Cache ◀── Rate Limiter ◀── Token Budget§Quick start
use hoosh::{InferenceRequest, HooshClient};
let client = HooshClient::new("http://localhost:8088");
let response = client.infer(&InferenceRequest {
model: "llama3".into(),
prompt: "Explain Rust ownership in one sentence.".into(),
..Default::default()
}).await?;
println!("{}", response.text);Re-exports§
pub use budget::TokenBudget;pub use budget::TokenPool;pub use cache::ResponseCache;pub use client::HooshClient;pub use error::HooshError;pub use inference::InferenceRequest;pub use inference::InferenceResponse;pub use inference::ModelInfo;pub use provider::LlmProvider;pub use provider::ProviderRegistry;pub use provider::ProviderType;pub use router::Router;pub use tools::ToolCall;pub use tools::ToolChoice;pub use tools::ToolDefinition;pub use tools::ToolResult;
Modules§
- audit
- Cryptographic audit log — HMAC-SHA256 linked chain for tamper-proof request/response logging.
- budget
- Token budget management: per-agent and per-pool token accounting.
- cache
- Response caching for inference results.
- client
- HTTP client for talking to a hoosh server.
- config
- Configuration file loading for hoosh.
- context
- Context management — token counting, context compaction, prompt compression.
- cost
- Per-provider cost tracking — pricing table and cost accumulation.
- error
- Error types for hoosh.
- events
- Provider event bus — pub/sub for internal provider events.
- hardware
- Hardware-aware model placement using ai-hwaccel.
- health
- Background health checker — periodic provider health monitoring with automatic failover.
- inference
- Core inference types: requests, responses, model metadata.
- metrics
- Prometheus metrics for the hoosh inference gateway.
- middleware
- provider
- LLM provider trait and type registry.
- queue
- Priority request queue — queue inference requests when providers are busy.
- router
- Request routing: provider selection, load balancing, fallback.
- server
- Axum HTTP server exposing the OpenAI-compatible API.
- tools
- Tool use & function calling — unified abstraction across LLM providers, with optional MCP integration via bote + szal.