autoagents-onnx 0.2.4

Minimal edge inference runtime for LLMs
Documentation
# Onnx-ort - High-Performance Edge Inference Runtime

---

### ⚠️ Experimental Feature Notice

**Disclaimer:** This project is currently in an **experimental** phase and is under active development. Features, APIs,
and internal logic are subject to change without notice. Stability and performance are not yet guaranteed.

We welcome feedback and contributions, but please use with caution in production environments. Expect breaking changes
and incomplete functionality as we iterate and improve the inference engine.

---

Onnx is a inference runtime designed specifically for edge computing environments. It provides
high-performance LLM inference with multiple backend support, comprehensive tokenization capabilities, and optimized
memory management with AutoAgents LLM Provider Support.

### Model Directory Structure

```
models/my-model/
├── model.onnx              # ONNX model file
├── tokenizer.json          # HuggingFace tokenizer
├── config.json             # Model configuration
├── tokenizer_config.json   # Tokenizer configuration
├── special_tokens_map.json # Special tokens mapping
└── chat_template.jinja     # Chat template (optional)
```