oris-runtime 0.8.0

An agentic workflow runtime and programmable AI execution system in Rust: stateful graphs, agents, tools, and multi-step execution.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
# Oris

**Oris is an execution runtime for software that reasons before it acts.**

[![Latest Version](https://img.shields.io/crates/v/oris-runtime.svg)](https://crates.io/crates/oris-runtime)
[![docs.rs](https://img.shields.io/docsrs/oris-runtime)](https://docs.rs/oris-runtime)

Modern AI systems are no longer single API calls.

They are long-running processes involving:

* planning
* tool execution
* memory updates
* retries
* human approval
* continuation across failures

Today, this logic lives in fragile background jobs, queues, and ad-hoc orchestration code.

**Oris turns reasoning into a first-class executable system.**

---

## What Oris Is

Oris is **not**:

* a prompt framework
* an agent SDK
* a chat orchestration library

Oris is closer to:

> **Temporal or Ray — but designed for reasoning workloads.**

It provides a durable execution environment where AI processes can:

* persist state
* pause and resume safely
* survive crashes or deployments
* replay execution deterministically
* coordinate tools and humans

---

## Core Idea

If:

* databases manage **data**
* message queues manage **communication**

then:

> **Oris manages reasoning processes.**

---

## Why Oris Exists

LLMs fundamentally changed backend architecture.

We are moving from:

```
request → response
```

to:

```
goal → reasoning → decisions → actions → memory → continuation
```

This is no longer an API problem.

It is an **execution problem**.

Oris introduces an execution kernel purpose-built for reasoning systems.

---

## Positioning

Oris aims to become:

> **The execution OS for reasoning-driven software systems.**

Where traditional workflow engines orchestrate tasks,
Oris orchestrates **decision-making processes**.

See [Oris 2.0 Strategy & Evolution Blueprint](docs/ORIS_2.0_STRATEGY.md) for architecture, axioms, and roadmap.

---

## Comparison

| | Oris | Temporal | LangGraph |
|---|------|----------|-----------|
| **Domain** | Reasoning processes | Task workflows | Agent graphs |
| **First-class** | Decision-making, LLM state | Tasks, activities | Chat, messages |
| **Replay** | Deterministic (reasoning) | Deterministic (tasks) | Limited |
| **Interrupt** | Human-in-the-loop native | External | Via nodes |

LangGraph users will understand it. Temporal users will respect it. Rust users will try it.

---

## What You Can Build

* autonomous coding systems
* long-running research agents
* human-approval workflows
* operational copilots
* AI backend pipelines
* durable agent infrastructure

---

## Design Principles

* Durable by default
* Interruptible execution
* Deterministic replay
* Stateful reasoning
* Tooling as system actions
* Execution over prompting

---

## Mental Model

```
Application Logic
Reasoning Graph
Oris Runtime
LLMs / Tools / Memory / Humans
```

---

## Status

Early but functional.
The runtime, graph execution, and agent loop are implemented and usable today.

---

## Quick start (30 seconds)

Add the crate and set your API key:

```bash
cargo add oris-runtime
export OPENAI_API_KEY="your-key"
```

Minimal LLM call:

```rust
use oris_runtime::{language_models::llm::LLM, llm::openai::OpenAI};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let llm = OpenAI::default();
    let response = llm.invoke("What is Rust?").await?;
    println!("{}", response);
    Ok(())
}
```

Hello-world state graph (no API key needed):

```rust
use oris_runtime::graph::{function_node, MessagesState, StateGraph, END, START};
use oris_runtime::schemas::messages::Message;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mock_llm = function_node("mock_llm", |_state: &MessagesState| async move {
        use std::collections::HashMap;
        let mut update = HashMap::new();
        update.insert(
            "messages".to_string(),
            serde_json::to_value(vec![Message::new_ai_message("hello world")])?,
        );
        Ok(update)
    });

    let mut graph = StateGraph::<MessagesState>::new();
    graph.add_node("mock_llm", mock_llm)?;
    graph.add_edge(START, "mock_llm");
    graph.add_edge("mock_llm", END);

    let compiled = graph.compile()?;
    let initial_state = MessagesState::with_messages(vec![Message::new_human_message("hi!")]);
    let _final_state = compiled.invoke(initial_state).await?;
    Ok(())
}
```

## Architecture

```mermaid
flowchart TB
  User[User Request]
  Runtime[Runtime: Graph or Agent]
  Tools[Tools]
  LLM[LLM Provider]
  Memory[Memory or State]
  User --> Runtime
  Runtime --> Tools
  Runtime --> LLM
  Runtime --> Memory
  Tools --> Runtime
  LLM --> Runtime
  Memory --> Runtime
```

## Key concepts

- **State graphs** — Define workflows as directed graphs; run, stream, and optionally persist state (e.g. SQLite or in-memory).
- **Agents and tools** — Give agents tools (search, filesystem, custom); use multi-agent routers and subagents.
- **Persistence and interrupts** — Checkpoint state, resume runs, and pause for human approval or review.

See the [examples](crates/oris-runtime/examples/) directory for runnable code.

## Public API (stable)

The following modules are the **stable surface** for building on Oris. Prefer these entry points; other modules may change in 0.1.x.

| Entry | Purpose |
|-------|---------|
| `oris_runtime::graph` | State graphs, execution, persistence, interrupts, trace (`StateGraph`, `MessagesState`, checkpointer, `NodePluginRegistry`, `interrupt`/resume, `InvokeResult.trace`, `TraceEvent`) |
| `oris_runtime::agent` | Agent loop, tools, Deep Agent (planning, skills) |
| `oris_runtime::tools` | Tool trait and built-in tools |

State types (e.g. `graph::MessagesState`, `graph::State`) are part of the stable graph API. [Full API docs](https://docs.rs/oris-runtime).

For human-in-the-loop checkpoint persistence in async runtimes, `oris_runtime::agent::AgentCheckpointer`
now exposes async-compatible `put_async` / `get_async` helpers while keeping the existing synchronous
`put` / `get` methods for backward compatibility.

For runtime-extensible graphs, `oris_runtime::graph::NodePluginRegistry` and `typed_node_plugin`
allow you to register custom node factories and add them to a `StateGraph` from validated JSON config.

## Install and config

```bash
cargo add oris-runtime
# With a vector store (e.g. PostgreSQL):
cargo add oris-runtime --features postgres
# With SQLite persistence (production-ready checkpoints):
cargo add oris-runtime --features sqlite-persistence
# With Ollama (local):
cargo add oris-runtime --features ollama
```

For durable execution across process restarts, use the `sqlite-persistence` feature and see the [durable_agent_job_sqlite](crates/oris-runtime/examples/durable_agent_job_sqlite.rs) example.

Common environment variables:

| Provider   | Variable           |
|-----------|--------------------|
| OpenAI    | `OPENAI_API_KEY`   |
| Anthropic | `ANTHROPIC_API_KEY` |
| Ollama    | `OLLAMA_HOST` (optional, default `http://localhost:11434`) |

## Examples and docs

- [Hello World graph]crates/oris-runtime/examples/graph_hello_world.rs
- [Custom node plugins]crates/oris-runtime/examples/custom_node_plugins.rs — register a typed runtime plugin and add a node from JSON config.
- [Plugin authoring (0.1.x)]docs/plugin-authoring.md — contract, compatibility, and safety for third-party plugins; [plugin_reference]examples/plugin_reference/README.md is a packaged example layout.
- [Starter service project (Axum)]examples/oris_starter_axum/README.md — standalone workspace example for integrating Oris into a Rust backend.
- [Standalone worker (Tokio)]examples/oris_worker_tokio/README.md — concrete `poll/heartbeat/ack` worker process when the execution server already exists.
- [Operator CLI]examples/oris_operator_cli/README.md — concrete terminal client for `run/list/inspect/resume/replay/cancel`.
- [Template matrix (service/worker/operator)]examples/templates/README.md`cargo-generate`-ready skeletons for external users.

Scaffold one of the starter archetypes directly:

```bash
cargo install cargo-generate
cargo generate --git https://github.com/Colin4k1024/Oris.git --subfolder examples/templates/axum_service --name my-oris-service
```
- [Durable agent job]crates/oris-runtime/examples/durable_agent_job.rs — interrupt, restart, resume with same `thread_id`; state is checkpointed so it survives process restarts.
- [Durable agent job with SQLite]crates/oris-runtime/examples/durable_agent_job_sqlite.rs — same flow with SQLite persistence (run with `--features sqlite-persistence`).
- [CLI durable job]crates/oris-runtime/examples/cli_durable_job.rs — minimal operator CLI: `run`, `list`, `inspect`, `resume`, `replay`, `cancel` (requires `--features sqlite-persistence`).
- [Execution server API]crates/oris-runtime/examples/execution_server.rs — runtime-bin HTTP API for `run/list/inspect/resume/replay/cancel` (run with `--features "sqlite-persistence,execution-server"`).
- [Agent with tools]crates/oris-runtime/examples/agent.rs
- [Streaming]crates/oris-runtime/examples/graph_streaming.rs
- [Persistence]crates/oris-runtime/examples/graph_persistence_basic.rs
- [Deep agent (planning + filesystem)]crates/oris-runtime/examples/deep_agent_basic.rs
- [Oris v1 OS architecture (single-tenant)]docs/oris-v1-os-architecture.md
- [Rust ecosystem integration guide]docs/rust-ecosystem-integration.md
- [Production operations guide]docs/production-operations-guide.md
- [Incident response runbook]docs/incident-response-runbook.md
- [Runtime schema migration workflow]docs/runtime-schema-migrations.md
- [Scheduler stress baseline]docs/scheduler-stress-baseline.md
- [PostgreSQL backup and restore runbook]docs/postgres-backup-restore-runbook.md
- [Open source onboarding guide (ZH)]docs/open-source-onboarding-zh.md
- [Observability assets (Grafana + alerts)]docs/observability/

Execution runtime namespaces:

- `oris_runtime::execution_runtime` — graph-agnostic control-plane types, repositories, scheduler, and API contract models.
- `oris_runtime::execution_server` — graph-aware HTTP server and benchmark helpers such as `build_router` and `ExecutionApiState`.
- `oris-execution-server` — package-level facade for the graph-aware execution server surface; use this crate if you want a dedicated dependency for the HTTP layer.
- Legacy graph-aware re-exports from `oris_runtime::execution_runtime` and `oris_runtime::kernel` still compile, but they are deprecated compatibility shims.

Start the execution server:

```bash
cargo run -p oris-runtime --example execution_server --features "sqlite-persistence,execution-server"
```

Default address: `127.0.0.1:8080` (`ORIS_SERVER_ADDR` to override)  
Default SQLite db path: `oris_execution_server.db` (`ORIS_SQLITE_DB` to override)
Runtime backend selector: `ORIS_RUNTIME_BACKEND` (`sqlite` default; `postgres` requires `kernel-postgres` feature)
Postgres DSN/schema: `ORIS_POSTGRES_DSN` (or `ORIS_RUNTIME_DSN`), `ORIS_POSTGRES_SCHEMA` (default `public`)
Postgres schema strictness: `ORIS_POSTGRES_REQUIRE_SCHEMA` (default `true`, startup fails if schema is missing)
Optional auth secrets: `ORIS_API_AUTH_BEARER_TOKEN`, `ORIS_API_AUTH_API_KEY`
Optional keyed API key id: `ORIS_API_AUTH_API_KEY_ID` (use with `ORIS_API_AUTH_API_KEY`)
Bad backend config/health now fails startup with actionable error and non-zero exit.
When `ORIS_API_AUTH_API_KEY_ID` is set with SQLite persistence, the key record is persisted in `runtime_api_keys`.
RBAC baseline: `admin` can access all APIs; `operator` can access `/v1/jobs*`, `/v1/interrupts*`, `/v1/dlq*`, `GET /v1/audit/logs`, and `GET /v1/attempts/:attempt_id/retries`; `worker` can access `/v1/workers*`.

Audit API:

- `GET /v1/audit/logs` — list control-plane audit logs (query: `request_id`, `action`, `from_ms`, `to_ms`, `limit`)

Attempt retry API:

- `GET /v1/attempts/:attempt_id/retries` — inspect retry scheduling history for an attempt

Dead-letter queue API:

- `GET /v1/dlq` — list dead-lettered attempts (query: `status`, `limit`)
- `GET /v1/dlq/:attempt_id` — inspect a dead-lettered attempt
- `POST /v1/dlq/:attempt_id/replay` — requeue a dead-lettered attempt for another dispatch cycle

Execution server endpoints (v1 runtime-bin):

- Canonical machine-readable contract: [docs/runtime-api-contract.json]docs/runtime-api-contract.json
- Regenerate after API changes: `bash scripts/update_runtime_api_contract.sh`
- Benchmark policy and baseline: [docs/runtime-benchmark-policy.md]docs/runtime-benchmark-policy.md
- `GET /metrics` — Prometheus scrape endpoint for runtime metrics (`queue_depth`, `dispatch_latency_ms`, `lease_conflict_rate`, `recovery_latency_ms`)
- `POST /v1/jobs/run`
  Optional request fields: `timeout_policy` with `{ "timeout_ms": <positive>, "on_timeout_status": "failed"|"cancelled" }`, `priority` (`0..100`, higher dispatches first), and `tenant_id` (stable throttling key). Optional header: `traceparent` (`00-<trace_id>-<span_id>-<flags>`) to continue an upstream W3C/OpenTelemetry trace; responses return `data.trace`.
- `GET /v1/jobs` — list jobs (query: `status`, `limit`, `offset`)
- `GET /v1/jobs/:thread_id`
- `GET /v1/jobs/:thread_id/detail` — run drill-down (status, attempts, checkpoint, pending interrupt)
- `GET /v1/jobs/:thread_id/timeline/export` — export timeline as JSON for audit
- `GET /v1/jobs/:thread_id/history`
- `GET /v1/jobs/:thread_id/timeline`
- `GET /v1/jobs/:thread_id/checkpoints/:checkpoint_id`
- `POST /v1/jobs/:thread_id/resume`
- `POST /v1/jobs/:thread_id/replay` — with `sqlite-persistence`, replay requests are fingerprinted by thread + replay target (`checkpoint_id` when present, otherwise current state fingerprint) and duplicate replays return the stored response instead of re-executing side effects
- `POST /v1/jobs/:thread_id/cancel`

Interrupt API (Phase 4):

- `GET /v1/interrupts` — list pending interrupts (query: `status`, `run_id`, `limit`)
- `GET /v1/interrupts/:interrupt_id` — get interrupt detail
- `POST /v1/interrupts/:interrupt_id/resume` — resume with value (delegates to job resume)
- `POST /v1/interrupts/:interrupt_id/reject` — reject/cancel interrupt (marks run cancelled)

Worker endpoints (Phase 3 baseline):

- `POST /v1/workers/poll`
  Optional request field: `tenant_max_active_leases` to cap concurrent active leases per tenant during dispatch; traced attempts return `data.trace`.
- `POST /v1/workers/:worker_id/heartbeat` — returns `data.trace` when the lease belongs to a traced attempt
- `POST /v1/workers/:worker_id/extend-lease`
- `POST /v1/workers/:worker_id/report-step` — returns `data.trace` when the attempt has trace context
- `POST /v1/workers/:worker_id/ack` — accepts optional `retry_policy` (`fixed` or `exponential`) on failed ack to schedule bounded retries, and returns `data.trace` when the attempt has trace context

Lease/failover/backpressure baseline behavior:

- `poll` first runs a lease-expiry tick (`expire_leases_and_requeue`) before dispatching.
- The same tick also transitions attempts that exceeded `started_at + timeout_ms` into their configured terminal status (`failed` or `cancelled`) before any requeue/dispatch.
- Under mixed queues, dispatch prefers higher `priority` before falling back to attempt order.
- `poll` enforces both per-worker and per-tenant active lease limits, returning `decision=backpressure` with `reason` and active-limit counters when throttled.
- `poll` enforces per-worker active-lease guardrail via `max_active_leases` (request) or server default.
- `poll` returns `decision` as `dispatched`, `noop`, or `backpressure`.
- `heartbeat` / `extend-lease` enforce lease ownership (`worker_id` must match lease owner), otherwise `409 conflict`.
- Expired leases are requeued automatically and become dispatchable again on subsequent polls.
- `ack` marks terminal attempt status (`completed` / `failed` / `cancelled`); failed ack can optionally schedule retry backoff and returns `retry_scheduled` with `next_retry_at`.
- Final failed attempts (including timeout-to-`failed`) are persisted into the DLQ and can be replayed through `/v1/dlq/:attempt_id/replay`.

Run idempotency contract (`POST /v1/jobs/run`):

- Send optional `idempotency_key`.
- Same `idempotency_key` + same payload returns the stored semantic result with `data.idempotent_replay=true`.
- Same replay target (`thread_id` + explicit `checkpoint_id`, or `thread_id` + current state fingerprint) is also deduplicated under `sqlite-persistence`; repeated replay calls return the stored response with `data.idempotent_replay=true`.
- Same `idempotency_key` + different payload returns `409 conflict`.
- Trace metadata is observational only and does not participate in idempotency matching.

Prometheus metrics contract:

- `oris_runtime_queue_depth` — current dispatchable queue depth gauge
- `oris_runtime_dispatch_latency_ms` — dispatch latency histogram
- `oris_runtime_lease_operations_total` / `oris_runtime_lease_conflicts_total` — lease operation and conflict counters
- `oris_runtime_lease_conflict_rate` — derived conflict-rate gauge
- `oris_runtime_backpressure_total{reason="worker_limit|tenant_limit"}` — backpressure counter by cause
- `oris_runtime_terminal_acks_total{status="completed|failed|cancelled"}` — terminal worker ack counters
- `oris_runtime_terminal_error_rate` — derived terminal error-rate gauge
- `oris_runtime_recovery_latency_ms` — failover recovery latency histogram

Prebuilt observability assets:

- Grafana dashboard: `docs/observability/runtime-dashboard.json`
- Prometheus alert rules: `docs/observability/prometheus-alert-rules.yml`
- Sample validation scrape: `docs/observability/sample-runtime-workload.prom`

Execution API error contract:

- Error shape:
  - `request_id`: correlation id (propagates `x-request-id` when provided)
  - `error.code`: stable machine code (`invalid_argument`, `unauthorized`, `forbidden`, `not_found`, `conflict`, `internal`)
  - `error.message`: human-readable summary
  - `error.details`: optional structured context

Example:

```json
{
  "request_id": "req-123",
  "error": {
    "code": "invalid_argument",
    "message": "thread_id must not be empty",
    "details": null
  }
}
```

Compatibility notes:

- Existing `request_id` and `data` fields in successful responses are preserved.
- Success envelopes now include `meta` (`status`, `api_version`) as additive fields.

[API documentation](https://docs.rs/oris-runtime) · [Examples directory](crates/oris-runtime/examples/)

## License and attribution

MIT. This project includes code derived from [langchain-rust](https://github.com/langchain-ai/langchain-rust); see [LICENSE](LICENSE).

## Community and policies

- Contribution guide: [CONTRIBUTING.md]CONTRIBUTING.md
- Code of conduct: [CODE_OF_CONDUCT.md]CODE_OF_CONDUCT.md
- Security policy: [SECURITY.md]SECURITY.md
- Privacy notice: [PRIVACY.md]PRIVACY.md
- Support guide: [SUPPORT.md]SUPPORT.md
- Governance: [GOVERNANCE.md]GOVERNANCE.md

## Links

- [Crates.io]https://crates.io/crates/oris-runtime
- [GitHub]https://github.com/Colin4k1024/Oris
- [docs.rs]https://docs.rs/oris-runtime