# Onde Inference — AGENTS.md
> **AI agent reference for the `onde` repository and the `onde-swift` Swift SDK.**
> Keep this file accurate as the codebase evolves.
---
## What Is Onde?
Onde is an **on-device LLM inference SDK family** built around a shared Rust core. It targets Apple silicon (iOS, tvOS, macOS) plus Android, Windows, and Linux, and ships SDKs for Rust, Swift, Kotlin Multiplatform, Flutter, and React Native. It wraps [mistral.rs](https://github.com/EricLBuehler/mistral.rs) behind platform-friendly APIs with automatic model selection, HuggingFace Hub downloads, cache management, and Metal acceleration where available.
- **Website:** https://ondeinference.com
- **Rust crate:** https://crates.io/crates/onde
- **Swift package:** https://github.com/ondeinference/onde-swift
- **Maven Central:** https://central.sonatype.com/artifact/com.ondeinference/onde-inference
- **Flutter package:** https://pub.dev/packages/onde_inference
- **React Native package:** https://www.npmjs.com/package/@ondeinference/react-native
- **In production:** [Splitfire AB apps on the Apple AppStore](https://apps.apple.com/se/developer/splitfire-ab/id1831430993)
---
## Repository Layout
```
onde/
├── src/
│ ├── lib.rs # Crate root — uniffi::setup_scaffolding!()
│ ├── hf_cache.rs # HuggingFace Hub cache: list, download, delete, diagnose, repair
│ └── inference/
│ ├── mod.rs # Module exports + top-level re-exports
│ ├── engine.rs # ChatEngine — Rust-native API (generics, mpsc, tool calling)
│ ├── ffi.rs # OndeChatEngine — UniFFI Object (FFI-safe, Arc-wrapped)
│ ├── models.rs # Model ID constants + SupportedModelInfo metadata
│ ├── token.rs # HF token resolution: build-time literal vs cache file
│ └── types.rs # Shared types: ChatMessage, SamplingConfig, InferenceResult, ToolCallInfo, etc.
├── sdk/
│ ├── dart/ # Flutter/Dart package + example app + FRB Rust bridge
│ ├── gem/ # Ruby native extension (Magnus)
│ ├── kotlin/ # Kotlin Multiplatform package (Android + JVM)
│ ├── react-native/ # Expo module wrapping the Rust core for iOS/Android
│ └── python/ # Python bindings (maturin + uniffi)
├── generated/ # UniFFI-generated headers and Swift glue (git-ignored)
├── uniffi-bindgen/ # Standalone bindgen binary crate (pinned uniffi =0.31.0)
├── .github/
│ ├── workflows/
│ │ ├── release-sdk-swift.yml # CI: tag push → build XCFramework → GitHub Release → update onde-swift
│ │ ├── release-sdk-kotlin.yml # CI: tag push → build Android/JVM artifacts → publish to Maven Central
│ │ ├── release-sdk-dart.yml # CI: tag push → publish Flutter package to pub.dev
│ │ ├── release-sdk-npm.yml # CI: tag push → publish React Native package to npm
│ │ └── release-sdk-rust.yml # CI: tag push → publish Rust crate to crates.io
│ └── scripts/
│ └── build-swift-xcframework.sh # Local/CI XCFramework assembly script
├── .cargo/config.toml # Target-specific rustflags (fp16, linker overrides)
├── Cargo.toml # Platform-conditional mistralrs deps
├── build.rs # tvOS ___chkstk_darwin assembly stub
├── scripts/
│ └── tvos_chkstk.s # No-op arm64 stub for missing tvOS symbol
├── uniffi.toml # UniFFI binding config
└── docs/
├── dev.md # Developer guide (build, architecture, platform table)
├── swift-package.md # Swift API reference + XCFramework build steps
├── distribution.md # Release process for all registries
└── ruby-gem.md # Ruby gem API reference
```
```
onde-swift/ # Swift Package Manager wrapper repo
├── Package.swift # Declares OndeFramework.xcframework binary target
└── Sources/
└── Onde/
└── onde.swift # UniFFI-generated Swift glue (do NOT edit manually)
```
---
## Architecture
### Layer Diagram
```
┌──────────────────────────────────────────────────────────┐
│ Swift (iOS / tvOS / macOS) Rust app / CLI / server │
│ import Onde use onde::inference::* │
└──────────┬───────────────────────────────┬───────────────┘
│ UniFFI FFI bindings │ Direct Rust API
▼ ▼
┌──────────────────────────┐ ┌─────────────────────────────┐
│ OndeChatEngine │ │ ChatEngine │
│ (ffi.rs — uniffi::Object│──│ (engine.rs — Rust-native) │
│ FFI-safe, Arc<Self>) │ │ generics, mpsc channels │
└──────────┬───────────────┘ └─────────────┬───────────────┘
│ │
└────────────────┬───────────────┘
▼
┌────────────────────────┐
│ mistralrs::Model │
│ GgufModelBuilder │
│ Metal / CUDA / CPU │
└────────────────────────┘
```
### Key Design Rules
1. **`ChatEngine`** (`engine.rs`) owns all Rust-idiomatic logic: `impl Into<String>`, `tokio::sync::mpsc::Receiver`, etc. Never add UniFFI annotations here.
2. **`OndeChatEngine`** (`ffi.rs`) is a thin `Arc`-wrapped UniFFI `Object` with concrete, FFI-safe signatures. It delegates every call to the inner `ChatEngine`.
3. **`StreamChunkListener`** is a `#[uniffi::export(callback_interface)]` trait — it must be passed as a parameter to **free functions** (not Object methods) because UniFFI 0.31 does not support callback_interface on Object methods.
4. **Free functions** in `ffi.rs` (e.g. `stream_chat_message`, `default_model_config`, `user_message`) are exported with `#[uniffi::export]` and generate top-level Swift functions.
5. **`uniffi::setup_scaffolding!()`** lives in `lib.rs`. Never call `uniffi::generate_scaffolding!()` from `build.rs` for the proc-macro path.
---
## Platform Support
| macOS | `macos` | Metal | Qwen 2.5 Coder 3B (~1.93 GB) | `["metal"]` |
| iOS | `ios` | Metal | Qwen 2.5 Coder 1.5B (~941 MB) | `["metal"]` |
| tvOS | `tvos` | Metal | Qwen 2.5 Coder 1.5B (~941 MB) | `["metal"]` |
| Android | `android` | CPU (candle) | Qwen 2.5 Coder 1.5B (~941 MB) | `[]` + hf-hub |
| Windows | `windows` | CPU (CUDA in CI) | Qwen 2.5 Coder 3B (~1.93 GB) | `[]` |
| Linux | `linux` | CPU (CUDA in CI) | Qwen 2.5 Coder 3B (~1.93 GB) | `[]` |
- **Current default routing:** `platform_default()` now prefers the Coder variants, 1.5B on mobile and 3B on desktop.
- **iOS / tvOS memory constraint:** iOS gives apps about 2–3 GB. The 3B model (~1.93 GB) can still cause OOM on constrained devices, so mobile defaults stay on the 1.5B model.
- **tvOS tier-3 target:** requires `cargo +nightly -Z build-std`. Stable toolchain cannot build tvOS targets.
- **Android `hf_hub`:** `dirs::home_dir()` panics in the Android sandbox. `hf-hub` is added as an explicit dependency on Android so `HF_HOME` can be seeded programmatically via `hf_hub::api::tokio::ApiBuilder`.
---
## Supported Models
All model constants live in `src/inference/models.rs`. When adding a new model:
1. Add `pub const` entries for the HF repo ID, GGUF filename, and, on Android, `TOK_MODEL_ID` when needed.
2. Add the repo ID to `SUPPORTED_MODELS` so `list_local_hf_models` filters it.
3. Add a `SupportedModelInfo` entry to `SUPPORTED_MODEL_INFO` with accurate `expected_size_bytes` from the HF API `siblings[].size`.
4. Add a constructor to `GgufModelConfig` in `engine.rs`.
5. Export a free function in `ffi.rs` for UniFFI consumers when the model should be reachable from Swift and Kotlin.
6. If the GGUF does not ship with a built-in chat template, set `chat_template` explicitly. DeepSeek Coder is the current example.
### Current Models
| Qwen 2.5 1.5B Instruct (GGUF Q4_K_M) | `bartowski/Qwen2.5-1.5B-Instruct-GGUF` | `Qwen2.5-1.5B-Instruct-Q4_K_M.gguf` | ~941 MB | All platforms |
| Qwen 2.5 3B Instruct (GGUF Q4_K_M) | `bartowski/Qwen2.5-3B-Instruct-GGUF` | `Qwen2.5-3B-Instruct-Q4_K_M.gguf` | ~1.93 GB | All platforms |
| Qwen 2.5 Coder 1.5B Instruct (GGUF Q4_K_M) | `bartowski/Qwen2.5-Coder-1.5B-Instruct-GGUF` | `Qwen2.5-Coder-1.5B-Instruct-Q4_K_M.gguf` | ~941 MB | All platforms (mobile default) |
| Qwen 2.5 Coder 3B Instruct (GGUF Q4_K_M) | `bartowski/Qwen2.5-Coder-3B-Instruct-GGUF` | `Qwen2.5-Coder-3B-Instruct-Q4_K_M.gguf` | ~1.93 GB | All platforms (desktop default) |
| Qwen 2.5 Coder 7B Instruct (GGUF Q4_K_M) | `bartowski/Qwen2.5-Coder-7B-Instruct-GGUF` | `Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf` | ~4.4 GB | Higher-memory devices |
| Qwen 3 1.7B (GGUF Q4_K_M) | `bartowski/Qwen3-1.7B-GGUF` | `Qwen3-1.7B-Q4_K_M.gguf` | ~1.3 GB | All platforms |
| Qwen 3 4B (GGUF Q4_K_M) | `bartowski/Qwen3-4B-GGUF` | `Qwen3-4B-Q4_K_M.gguf` | ~2.7 GB | All platforms |
| Qwen 3 8B (GGUF Q4_K_M) | `bartowski/Qwen3-8B-GGUF` | `Qwen3-8B-Q4_K_M.gguf` | ~5 GB | Higher-memory devices |
| Qwen 3 14B (GGUF Q4_K_M) | `bartowski/Qwen3-14B-GGUF` | `Qwen3-14B-Q4_K_M.gguf` | ~8.4 GB | Higher-memory devices |
| DeepSeek Coder 6.7B Instruct (GGUF Q4_K_M) | `bartowski/deepseek-coder-6.7b-instruct-GGUF` | `deepseek-coder-6.7b-instruct-Q4_K_M.gguf` | ~3.8 GB | Higher-memory devices, custom chat template |
| Qwen 2.5 Coder 7B Instruct (ISQ) | `Qwen/Qwen2.5-Coder-7B-Instruct` | safetensors (ISQ in-situ) | ~8 GB | macOS (ISQ pipeline) |
---
## Key Types (Rust ↔ Swift)
| `ChatRole` | `enum ChatRole` | `.system`, `.user`, `.assistant` |
| `ChatMessage` | `struct ChatMessage` | `role: ChatRole`, `content: String` |
| `SamplingConfig` | `struct SamplingConfig` | All fields `Optional` |
| `GgufModelConfig` | `struct GgufModelConfig` | `modelId`, `files`, `tokModelId?`, `displayName`, `approxMemory`, `chatTemplate?` |
| `IsqModelConfig` | `struct IsqModelConfig` | `modelId`, `isqBits: UInt8`, `displayName`, `approxMemory` |
| `InferenceResult` | `struct InferenceResult` | `text`, `durationSecs`, `durationDisplay`, `finishReason`, `toolCalls` |
| `ToolCallInfo` | `struct ToolCallInfo` | Structured tool call request emitted by the model |
| `StreamChunk` | `struct StreamChunk` | `delta`, `done`, `finishReason?` |
| `EngineStatus` | `enum EngineStatus` | `.unloaded`, `.loading`, `.ready`, `.generating`, `.error` |
| `EngineInfo` | `struct EngineInfo` | `status`, `modelName?`, `approxMemory?`, `historyLength: UInt64` |
| `InferenceError` | `enum InferenceError: Error` | `noModelLoaded`, `alreadyLoaded`, `modelBuild`, `inference`, `cancelled`, `other` |
| `OndeChatEngine` | `class OndeChatEngine` | Thread-safe; `Arc`-backed; constructed with `OndeChatEngine()` |
| `StreamChunkListener` | `protocol StreamChunkListener` | Implement `onChunk(chunk:) -> Bool` |
---
## HuggingFace Token
`src/inference/token.rs` resolves the HF token in priority order:
1. **Build-time literal** (`HF_TOKEN` env var baked in via `option_env!`) — required for iOS/tvOS (no filesystem token possible).
2. **On-disk cache** (`~/.cache/huggingface/token`) — works on macOS after `mistralrs login`.
Set `HF_TOKEN` before building:
```bash
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
cargo build --release
```
On sandboxed platforms, your app's setup function must also configure `HF_HOME`, `HF_HUB_CACHE`, and `TMPDIR` to point inside the app container before any `OndeChatEngine` method is called. See `docs/swift-package.md` for the full `setupInferenceEnvironment()` Swift snippet.
---
## Building (Rust)
### Prerequisites
```bash
# Stable toolchain (macOS, iOS, Android, Windows, Linux)
rustup toolchain install stable
# Nightly toolchain (tvOS tier-3 targets only)
rustup toolchain install nightly
rustup component add rust-src --toolchain nightly
# Apple targets (stable)
rustup target add aarch64-apple-ios aarch64-apple-ios-sim
rustup target add aarch64-apple-darwin x86_64-apple-darwin
```
### Commands
```bash
# Verify compilation (macOS host)
cargo check
# Run all tests
cargo test
# Run inference module tests only
cargo test inference::
# Run FFI wrapper tests only
cargo test inference::ffi::
# Clippy lint
cargo clippy
# Format
cargo fmt
```
### tvOS
tvOS targets require nightly and `-Z build-std`:
```bash
cargo +nightly rustc -Z build-std \
--target aarch64-apple-tvos --release --lib --crate-type staticlib
```
---
## Building the Swift XCFramework
The XCFramework bundles staticlibs for iOS device, iOS simulator, tvOS device, tvOS simulator, and macOS into a single distributable that `Package.swift` references.
### Quick build (local or CI)
```bash
.github/scripts/build-swift-xcframework.sh
```
Output lands in `dist/swift/`:
- `OndeFramework.xcframework/` — the framework tree
- `OndeFramework.xcframework.zip` — zipped for SPM remote binary
- `OndeFramework.checksum.txt` — SHA-256 for `Package.swift`
- `version.txt` — semver from `Cargo.toml`
- `Package/Sources/Onde/onde.swift` — generated UniFFI Swift glue
### Manual steps
```bash
# 1. Build the uniffi-bindgen binary (pinned to uniffi =0.31.0)
cargo build --manifest-path uniffi-bindgen/Cargo.toml --release
BINDGEN=uniffi-bindgen/target/release/uniffi-bindgen
# 2. Compile staticlibs per target (use +1.92.0 or current stable)
cargo +stable rustc --target aarch64-apple-ios --release --lib --crate-type staticlib
cargo +stable rustc --target aarch64-apple-ios-sim --release --lib --crate-type staticlib
cargo +stable rustc --target aarch64-apple-darwin --release --lib --crate-type staticlib
cargo +nightly rustc -Z build-std --target aarch64-apple-tvos --release --lib --crate-type staticlib
cargo +nightly rustc -Z build-std --target aarch64-apple-tvos-sim --release --lib --crate-type staticlib
# 3. Generate Swift bindings from the iOS arm64 slice
$BINDGEN generate target/aarch64-apple-ios/release/libonde.a \
--language swift \
--out-dir sdk/Onde/Sources/Onde \
--config uniffi.toml
# 4. Assemble the XCFramework (see build-swift-xcframework.sh for full xcodebuild invocation)
```
### Updating `onde-swift` after a new release
This is fully automated by CI. Publishing a GitHub Release on `onde` triggers the workflow, which:
1. Builds the XCFramework and attaches it to the release.
2. Rewrites `onde-swift/Package.swift` with the release `url:` + `checksum:`.
3. Copies the freshly generated `onde.swift` into `onde-swift/Sources/Onde/`.
4. Commits, tags, and pushes `onde-swift` — Swift Package Index picks it up automatically.
Manual intervention is only needed if `ONDE_SWIFT_PAT` has expired or the `onde-swift` push fails.
---
## Swift SDK (`onde-swift`)
### Package.swift structure
```
onde-swift/
├── Package.swift # Declares OndeFramework binary target + Onde wrapper target
└── Sources/Onde/
└── onde.swift # UniFFI-generated glue (NEVER edit manually)
```
`Package.swift` has two targets:
- **`OndeFramework`** — `.binaryTarget` using `url:` + `checksum:` for distribution. For local development, swap to the `path:` form documented in the `Package.swift` header comment and run `build-swift-xcframework.sh` first.
- **`Onde`** — `.target` depending on `OndeFramework`, used as the public import name in Swift.
### Swift API Quick Reference
```swift
import Onde
// Create engine
let engine = OndeChatEngine()
// Load the platform default model
let elapsed = try await engine.loadDefaultModel(
systemPrompt: "You are a helpful assistant.",
sampling: nil
)
// Or load the model assigned to your app in the Onde dashboard
let assignedElapsed = try await engine.loadAssignedModel(
appId: "your-app-id",
appSecret: "your-app-secret",
systemPrompt: "You are a helpful assistant.",
sampling: nil
)
// Multi-turn chat
let result = try await engine.sendMessage(message: "Hello!")
print(result.text)
print(result.toolCalls)
// Streaming (free function — callback_interface constraint in UniFFI 0.31)
class Handler: StreamChunkListener {
func onChunk(chunk: StreamChunk) -> Bool {
print(chunk.delta, terminator: "")
return !chunk.done
}
}
try await streamChatMessage(engine: engine, message: "Tell me a story.", listener: Handler())
// One-shot (does NOT modify conversation history)
let enhanced = try await engine.generate(
messages: [userMessage(content: "Expand: a cat in space")],
sampling: deterministicSamplingConfig()
)
// Status
let info = await engine.info() // EngineInfo
// History management
let history = await engine.history()
let removed = await engine.clearHistory()
await engine.pushHistory(message: userMessage(content: "..."))
// Cleanup
await engine.unloadModel()
```
### Free Functions
| `defaultModelConfig()` | `GgufModelConfig` | Platform-aware Coder default (1.5B on iOS/tvOS/Android, 3B on desktop) |
| `qwen251_5bConfig()` | `GgufModelConfig` | Forces Qwen 2.5 1.5B regardless of platform |
| `qwen253bConfig()` | `GgufModelConfig` | Forces Qwen 2.5 3B regardless of platform |
| `defaultSamplingConfig()` | `SamplingConfig` | temp=0.7, top_p=0.95, max_tokens=512 |
| `deterministicSamplingConfig()` | `SamplingConfig` | temp=0.0, greedy |
| `mobileSamplingConfig()` | `SamplingConfig` | temp=0.7, max_tokens=128 |
| `systemMessage(content:)` | `ChatMessage` | `.system` role |
| `userMessage(content:)` | `ChatMessage` | `.user` role |
| `assistantMessage(content:)` | `ChatMessage` | `.assistant` role |
| `streamChatMessage(engine:message:listener:)` | `async throws` | Streaming via callback |
---
## UniFFI Conventions
- **Version pin:** `uniffi = "=0.31.0"` everywhere — the `onde` crate, `uniffi-bindgen/`, and `[build-dependencies]` must all use the **same** version. Mixing versions causes bindgen panics.
- **`uniffi::setup_scaffolding!()`** in `lib.rs` — proc-macro approach, no UDL file needed.
- **Object methods** use `#[uniffi::export]` on the `impl OndeChatEngine` block.
- **Callback interfaces** (`StreamChunkListener`) must be parameters of **free functions** only, not Object methods.
- **`Arc<Self>`** is the return type for `#[uniffi::constructor]`. UniFFI automatically handles this.
- **Async:** all async exported methods use `tokio` runtime (enabled via `uniffi = { features = ["tokio"] }`).
- **`uniffi.toml`**: lives at the crate root. Adjust renaming or namespace settings there before regenerating.
---
## HuggingFace Cache (`hf_cache.rs`)
The `hf_cache` module manages the on-device model cache, exposed via UniFFI to Swift / Kotlin as-needed.
Key public functions:
| `list_local_hf_models()` | List downloaded models that Onde supports |
| `list_supported_hf_models()` | All supported models with download status |
| `download_model(model_id, progress_callback)` | Download a model with progress reporting |
| `delete_local_hf_model(model_id)` | Remove a model from the local cache |
| `diagnose_hf_cache()` | Inspect the cache for corruption |
| `repair_hf_cache_symlinks()` | Fix broken symlinks in the HF cache layout |
| `model_cache_path(model_id)` | Resolve the filesystem path for a model |
| `clean_stale_lock_files()` | Remove leftover `.lock` files from interrupted downloads |
`ModelDownloadProgress` carries `downloaded_bytes`, `total_bytes`, `progress` (0.0–1.0), and `done`.
---
## Sampling Presets
| `SamplingConfig::default()` | 0.7 | 0.95 | 512 | General creative chat |
| `SamplingConfig::deterministic()` | 0.0 | — | 512 | Reproducible / coding |
| `SamplingConfig::mobile()` | 0.7 | 0.95 | 128 | Memory/latency constrained |
| `SamplingConfig::coding()` | 0.0 | — | 512 | Code generation |
| `SamplingConfig::coding_mobile()` | 0.0 | — | 128 | Code on mobile |
---
## Testing
```bash
# All tests (requires a macOS host with Metal)
cargo test
# Unit tests only (no model downloads)
cargo test inference::
cargo test inference::ffi::
cargo test inference::types::
# hf_cache module
cargo test hf_cache::
# Clippy (treat warnings as errors)
cargo clippy -- -D warnings
```
Tests that require model downloads are integration tests and not run by default in CI. Unit tests verify:
- Type constructors and `Display` implementations
- `SamplingConfig` presets
- `GgufModelConfig` constructors and `platform_default()` routing
- `OndeChatEngine` lifecycle: `new()` starts unloaded, `send_message` without model returns `InferenceError::NoModelLoaded`, `clear_history` on empty returns 0, `unload_model` on empty returns `nil`.
---
## Known Issues
### `___chkstk_darwin` linker error on tvOS
`aws-lc-sys` (transitive via `reqwest → rustls → aws-lc-rs`) references `___chkstk_darwin`, a stack probing symbol that tvOS does not export. `build.rs` compiles `scripts/tvos_chkstk.s` (a no-op `ret` stub) via the `cc` crate to satisfy the linker. **Do not delete `scripts/tvos_chkstk.s`.**
Affects: tvOS only. macOS, iOS, Android, Windows, Linux are unaffected.
### Metal Toolchain missing (Xcode 26+)
On Xcode 26+, the Metal compiler is a separate download. If you see empty `.metallib` files (92 bytes) or `"Error while loading function: fused_glu_float"` at runtime:
```bash
xcodebuild -downloadComponent MetalToolchain
cargo clean -p mistralrs-quant
cargo check
```
### Android `home_dir` sandbox panic
`dirs::home_dir()` panics in the Android sandbox. The crate-level `Cargo.toml` adds `hf-hub` as an explicit Android dependency so `HF_HOME` can be seeded via `ApiBuilder` before any hub request. Never call `home::home_dir()` or `dirs::home_dir()` on Android.
---
## Code Conventions
### Rust
- **Error handling:** `anyhow` for application-level errors inside `engine.rs`; `thiserror` for the `InferenceError` enum (exported via UniFFI). Never `.unwrap()` or `.expect()` in non-test code.
- **Async:** `tokio` runtime. All async functions in `OndeChatEngine` are `pub async fn`.
- **Logging:** `log` crate macros (`log::debug!`, `log::info!`, `log::warn!`, `log::error!`). No `println!` in library code.
- **Platform gating:** use `#[cfg(target_os = "...")]` blocks. Match `Cargo.toml`'s target-conditional dependency sections.
- **Re-exports:** `mistralrs`, `hf_hub`, and `mistralrs_core` are re-exported from `lib.rs` for downstream Rust consumers. Keep these re-exports in sync with what's actually available per platform.
- **No `mod.rs`:** use named files (`inference/engine.rs`) not `inference/mod.rs` — except that `inference/mod.rs` exists and is the intentional module root for the `inference` module.
### Swift / SDK
- **Never manually edit `onde-swift/Sources/Onde/onde.swift`** — it is generated by `uniffi-bindgen`. Regenerate by running the build script.
- **`Package.swift` binary target** uses `url:` + `checksum:` in the committed form. For local development, swap to `path:` (instructions are in the file header). Never commit the `path:` form — CI overwrites it on every release via `release-sdk-swift.yml`.
- **iOS/tvOS sandbox:** always call `setupInferenceEnvironment()` at app launch before any `OndeChatEngine` call.
---
## CI / Release Workflow
### `release-sdk-swift.yml`
Triggered on **semver tag push** or `workflow_dispatch`. Runs on `macos-15`.
Steps:
1. Install stable Rust with iOS/macOS targets.
2. Install nightly Rust with `rust-src` for tvOS `-Z build-std`.
3. Run `build-swift-xcframework.sh`.
4. Read `version.txt` and `OndeFramework.checksum.txt` into step outputs.
5. Validate that the tag matches `Cargo.toml`.
6. Upload CI artifacts: zip, checksum, version, generated `onde.swift`.
7. On tag push, create a GitHub Release with the zip and checksum attached.
8. On tag push, check out `ondeinference/onde-swift` using `ONDE_SWIFT_PAT`.
9. On tag push, rewrite the `.binaryTarget` in `onde-swift/Package.swift` with the release URL and checksum.
10. On tag push, copy the generated `onde.swift` into `onde-swift/Sources/Onde/`.
11. On tag push, commit both files, tag the commit with the version, and push to `ondeinference/onde-swift`.
#### Required secret
`ONDE_SWIFT_PAT` — a GitHub Personal Access Token with **`contents: write`** scope on the `ondeinference/onde-swift` repository.
### `release-sdk-kotlin.yml`
Triggered on **semver tag push** or `workflow_dispatch`. Runs on `macos-15` because it builds both Android artifacts and the macOS Apple Silicon JVM native library.
Steps:
1. Install stable Rust and the `aarch64-apple-darwin` target.
2. Install Android SDK / NDK and auto-discover the latest installed 29.x NDK for `cargo-ndk`.
3. Build Android JNI libraries for all supported ABIs.
4. Build the macOS Apple Silicon `libonde.dylib` used by the JVM target.
5. Run Gradle publication tasks for the Kotlin Multiplatform package.
6. Validate that the tag matches `sdk/kotlin/gradle.properties` `VERSION_NAME`.
7. On tag push, publish `com.ondeinference:onde-inference` to Maven Central.
#### Required secrets
- `ORG_GRADLE_PROJECT_MAVENCENTRALUSERNAME`
- `ORG_GRADLE_PROJECT_MAVENCENTRALPASSWORD`
- `ORG_GRADLE_PROJECT_SIGNINGKEYID`
- `ORG_GRADLE_PROJECT_SIGNINGKEY`
- `ORG_GRADLE_PROJECT_SIGNINGPASSWORD`
### `release-sdk-dart.yml`
Triggered on **semver tag push** or `workflow_dispatch`.
- Publishes `onde_inference` to pub.dev.
- Version must match the Dart package version.
- The example app is part of release validation.
### `release-sdk-npm.yml`
Triggered on **semver tag push** or `workflow_dispatch`.
- Publishes `@ondeinference/react-native` to npm.
- Builds the Expo module package and validates the published version.
### `release-sdk-rust.yml`
Triggered on **semver tag push** or `workflow_dispatch`.
- Publishes `onde` to crates.io.
- Uses registry dependencies, not git refs, because `cargo publish` strips git patches.
### Release Process (end-to-end)
1. Bump versions in every affected package manifest:
- `Cargo.toml`
- `sdk/kotlin/gradle.properties`
- `sdk/dart/pubspec.yaml`
- `sdk/react-native/package.json`
2. Merge release branches with `--no-ff`. Do not fast-forward multi-commit feature branches.
3. Commit, tag, and push the release.
4. CI publishes all five SDKs:
- crates.io (`onde`)
- Swift Package Index / GitHub Releases (`onde-swift`)
- Maven Central (`com.ondeinference:onde-inference`)
- pub.dev (`onde_inference`)
- npm (`@ondeinference/react-native`)
5. Verify the registries after publish, especially Maven Central indexing and the generated Swift release assets.
> **Note:** `onde-swift/Package.swift` is updated automatically by CI. For local development only, switch it to the `path:` form documented in the file header, then switch it back before committing.
---
## Distribution Registry Summary
| crates.io | `onde` | `onde = "1.x"` |
| Swift Package Index | `onde-swift` (org: `ondeinference`) | `import Onde` |
| Maven Central | `com.ondeinference:onde-inference` | Gradle `implementation("com.ondeinference:onde-inference:<version>")` |
| pub.dev | `onde_inference` | `import 'package:onde_inference/onde_inference.dart';` |
| npm | `@ondeinference/react-native` | `import { OndeChatEngine } from "@ondeinference/react-native"` |
| PyPI | `onde-inference` | `import onde_inference` |
| RubyGems | `onde-inference` | `require 'onde'` |
---
*This file is for AI agent and human developer reference. Update it when the architecture, API surface, supported models, or toolchain requirements change.*
```