api_huggingface
Comprehensive Rust client for HuggingFace Inference API with Router API support for Pro models.
🎯 Architecture: Stateless HTTP Client
This API crate is designed as a stateless HTTP client with zero persistence requirements. It provides:
- Direct HTTP calls to the HuggingFace Inference API
- In-memory operation state only (resets on restart)
- No external storage dependencies (databases, files, caches)
- No configuration persistence beyond environment variables
This ensures lightweight, containerized deployments and eliminates operational complexity.
🏛️ Governing Principle: "Thin Client, Rich API"
Expose all server-side functionality transparently while maintaining zero client-side intelligence or automatic behaviors.
Key principles:
- API Transparency: One-to-one mapping with HuggingFace API endpoints
- Zero Automatic Behavior: No implicit decision-making or magic thresholds
- Explicit Control: Developer decides when, how, and why operations occur
- Configurable Reliability: Enterprise features available through explicit configuration
Scope
In Scope
- Text generation via Router API (Llama-3, Mistral, Kimi-K2)
- Embeddings generation with similarity calculations
- Model discovery and status checking
- Streaming responses (SSE)
- Vision APIs (classification, detection, captioning)
- Audio APIs (ASR, TTS, classification, transformation)
- Enterprise reliability (circuit breaker, rate limiting, failover, health checks)
- Synchronous API wrapper
Out of Scope
- Model training (inference only)
- File upload/download (text-based API interactions only)
- Custom model hosting (HuggingFace hosted models only)
- GraphQL support (REST API only)
Features
Core Capabilities:
- Router API for Pro plan models (OpenAI-compatible format)
- Text generation with streaming support
- Embeddings with similarity calculations
- Model availability checking
Multimodal Features:
- Vision: Image classification, object detection, captioning
- Audio: Speech recognition, text-to-speech, classification
Enterprise Reliability:
- Circuit breaker pattern for failure detection
- Rate limiting with token bucket algorithm
- Multi-endpoint failover (4 strategies)
- Background health checks
- Dynamic configuration with runtime updates
- Performance metrics tracking
- LRU caching with TTL
Installation
Add to your Cargo.toml:
[]
= "0.2.0"
Quick Start
Basic Usage
use
;
async
Embeddings with Similarity
use ;
async
Authentication
Option 1: Workspace Secret (Recommended)
Create secret/-secrets.sh in your workspace root:
#!/bin/bash
Option 2: Environment Variable
Get your API key from huggingface.co/settings/tokens.
Feature Flags
Core Features
default- Core async inference and embeddingsinference- Text generation APIembeddings- Embeddings generationmodels- Model discovery and status
Streaming and Processing
inference-streaming- SSE streaming supportembeddings-similarity- Similarity calculationsembeddings-batch- Batch processing
Enterprise Reliability
circuit-breaker- Failure detection and recoveryrate-limiting- Token bucket rate limitingfailover- Multi-endpoint failoverhealth-checks- Background health monitoringdynamic-config- Runtime configuration
Client Enhancements
sync- Synchronous API wrapperscaching- LRU caching with TTLperformance-metrics- Request tracking
Presets
full- All features enabledintegration- Integration tests with real API
Testing
Test Coverage
- Comprehensive unit and integration tests
- Real API integration tests (no mocking)
- No-mockup policy: all tests use real HuggingFace API
Supported Models
Router API (Pro Plan)
| Model | Provider | Capabilities |
|---|---|---|
| moonshotai/Kimi-K2-Instruct-0905 | groq | Chat completions |
| meta-llama/Llama-3.1-8B-Instruct | various | Chat completions |
| mistralai/Mistral-7B-Instruct | various | Chat completions |
| codellama/CodeLlama-34b-Instruct | various | Code generation |
Legacy Inference API (Free Tier)
| Model | Task |
|---|---|
| facebook/bart-large-cnn | Summarization |
| gpt2 | Text generation |
| sentence-transformers/all-MiniLM-L6-v2 | Embeddings |
Documentation
- Features Overview - Feature list and cargo features
- API Reference - Comprehensive API documentation
- Examples - Working code examples
- Specification - Detailed technical specification
Dependencies
- reqwest: HTTP client with async support
- tokio: Async runtime
- serde: Serialization/deserialization
- workspace_tools: Secret management
- error_tools: Unified error handling
- secrecy: Secure credential handling
All dependencies workspace-managed for consistency.
Contributing
- Follow established patterns in existing code
- Use 2-space indentation consistently
- Add tests for new functionality
- Update documentation for public APIs
- Ensure zero clippy warnings:
cargo clippy -- -D warnings - Follow zero-tolerance mock policy (real API integration only)
- Follow the "Thin Client, Rich API" principle
License
MIT
Links
- HuggingFace Hub - Model discovery
- API Tokens - Get your API key
- Inference API Docs - Official documentation
- Specification - Technical specification