Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Oris
Oris is an execution runtime for software that reasons before it acts.
Modern AI systems are no longer single API calls.
They are long-running processes involving:
- planning
- tool execution
- memory updates
- retries
- human approval
- continuation across failures
Today, this logic lives in fragile background jobs, queues, and ad-hoc orchestration code.
Oris turns reasoning into a first-class executable system.
What Oris Is
Oris is not:
- a prompt framework
- an agent SDK
- a chat orchestration library
Oris is closer to:
Temporal or Ray — but designed for reasoning workloads.
It provides a durable execution environment where AI processes can:
- persist state
- pause and resume safely
- survive crashes or deployments
- replay execution deterministically
- coordinate tools and humans
Core Idea
If:
- databases manage data
- message queues manage communication
then:
Oris manages reasoning processes.
Why Oris Exists
LLMs fundamentally changed backend architecture.
We are moving from:
request → response
to:
goal → reasoning → decisions → actions → memory → continuation
This is no longer an API problem.
It is an execution problem.
Oris introduces an execution kernel purpose-built for reasoning systems.
Positioning
Oris aims to become:
The execution OS for reasoning-driven software systems.
Where traditional workflow engines orchestrate tasks, Oris orchestrates decision-making processes.
See Oris 2.0 Strategy & Evolution Blueprint for architecture, axioms, and roadmap.
Comparison
| Oris | Temporal | LangGraph | |
|---|---|---|---|
| Domain | Reasoning processes | Task workflows | Agent graphs |
| First-class | Decision-making, LLM state | Tasks, activities | Chat, messages |
| Replay | Deterministic (reasoning) | Deterministic (tasks) | Limited |
| Interrupt | Human-in-the-loop native | External | Via nodes |
LangGraph users will understand it. Temporal users will respect it. Rust users will try it.
What You Can Build
- autonomous coding systems
- long-running research agents
- human-approval workflows
- operational copilots
- AI backend pipelines
- durable agent infrastructure
Design Principles
- Durable by default
- Interruptible execution
- Deterministic replay
- Stateful reasoning
- Tooling as system actions
- Execution over prompting
Mental Model
Application Logic
↓
Reasoning Graph
↓
Oris Runtime
↓
LLMs / Tools / Memory / Humans
Status
Early but functional. The runtime, graph execution, and agent loop are implemented and usable today.
Quick start (30 seconds)
Add the crate and set your API key:
Minimal LLM call:
use ;
async
Hello-world state graph (no API key needed):
use ;
use Message;
async
Architecture
flowchart TB
User[User Request]
Runtime[Runtime: Graph or Agent]
Tools[Tools]
LLM[LLM Provider]
Memory[Memory or State]
User --> Runtime
Runtime --> Tools
Runtime --> LLM
Runtime --> Memory
Tools --> Runtime
LLM --> Runtime
Memory --> Runtime
Key concepts
- State graphs — Define workflows as directed graphs; run, stream, and optionally persist state (e.g. SQLite or in-memory).
- Agents and tools — Give agents tools (search, filesystem, custom); use multi-agent routers and subagents.
- Persistence and interrupts — Checkpoint state, resume runs, and pause for human approval or review.
See the examples directory for runnable code.
Public API (stable)
The following modules are the stable surface for building on Oris. Prefer these entry points; other modules may change in 0.1.x.
| Entry | Purpose |
|---|---|
oris_runtime::graph |
State graphs, execution, persistence, interrupts, trace (StateGraph, MessagesState, checkpointer, NodePluginRegistry, interrupt/resume, InvokeResult.trace, TraceEvent) |
oris_runtime::agent |
Agent loop, tools, Deep Agent (planning, skills) |
oris_runtime::tools |
Tool trait and built-in tools |
State types (e.g. graph::MessagesState, graph::State) are part of the stable graph API. Full API docs.
For human-in-the-loop checkpoint persistence in async runtimes, oris_runtime::agent::AgentCheckpointer
now exposes async-compatible put_async / get_async helpers while keeping the existing synchronous
put / get methods for backward compatibility.
For runtime-extensible graphs, oris_runtime::graph::NodePluginRegistry and typed_node_plugin
allow you to register custom node factories and add them to a StateGraph from validated JSON config.
Install and config
# With a vector store (e.g. PostgreSQL):
# With SQLite persistence (production-ready checkpoints):
# With the checked-in Evo experimental surface:
# With Ollama (local):
For durable execution across process restarts, use the sqlite-persistence feature and see the durable_agent_job_sqlite example.
Common environment variables:
| Provider | Variable |
|---|---|
| OpenAI | OPENAI_API_KEY |
| Anthropic | ANTHROPIC_API_KEY |
| Ollama | OLLAMA_HOST (optional, default http://localhost:11434) |
Examples and docs
- Hello World graph
- Custom node plugins — register a typed runtime plugin and add a node from JSON config.
- Plugin authoring (0.1.x) — contract, compatibility, and safety for third-party plugins; plugin_reference is a packaged example layout.
- Starter service project (Axum) — standalone workspace example for integrating Oris into a Rust backend.
- Standalone worker (Tokio) — concrete
poll/heartbeat/ackworker process when the execution server already exists. - Operator CLI — concrete terminal client for
run/list/inspect/resume/replay/cancel. - Evolution example suite — scenario-based Evo programs (capture/replay, supervised devloop, bootstrap, network, economics, metrics).
- Template matrix (service/worker/operator) —
cargo-generate-ready skeletons for external users.
Scaffold one of the starter archetypes directly:
Experimental EvoKernel (current repo slice)
The self-evolution stack is still experimental and intentionally feature-gated.
- Use
evolution-experimentalwhen you only needoris_runtime::evolution. - Use
full-evolution-experimentalwhen you want the end-to-end facade used by the checked-in example (evolution,governor,evolution_network,economics,spec_contract, andagent_contract).
The current repository-backed path is:
AgentTask
-> MutationProposal
-> capture_from_proposal
-> feedback_for_agent
-> replay_or_fallback_for_run
Run the canonical example, scenario bins, and smoke test:
OPENAI_COMPAT_API_KEY=...
Run the orchestrator validation gate locally:
Run the orchestrator A2A-to-GitHub closed-loop example:
RUNTIME_BASE_URL=http://127.0.0.1:8080 \
GITHUB_OWNER=Colin4k1024 \
GITHUB_REPO=Oris \
GITHUB_TOKEN=ghp_xxx \
ISSUE_ID=issue-123 \
What exists today: proposal-driven capture, sandboxed validation, JSONL evolution storage, and replay-first reuse.
Use replay_or_fallback_for_run when you want an explicit replay audit id; it records CapsuleReused.replay_run_id while preserving the capsule's original run_id. replay_or_fallback still works and auto-generates one.
What is still design-target only: always-on autonomous dev loops, issue intake, and automatic branch/release orchestration.
More runtime examples and ops docs
- Durable agent job — interrupt, restart, resume with same
thread_id; state is checkpointed so it survives process restarts. - Durable agent job with SQLite — same flow with SQLite persistence (run with
--features sqlite-persistence). - CLI durable job — minimal operator CLI:
run,list,inspect,resume,replay,cancel(requires--features sqlite-persistence). - Execution server API — runtime-bin HTTP API for
run/list/inspect/resume/replay/cancel(run with--features "sqlite-persistence,execution-server"). - Agent with tools
- Streaming
- Persistence
- Deep agent (planning + filesystem)
- Oris v1 OS architecture (single-tenant)
- Rust ecosystem integration guide
- Production operations guide
- Incident response runbook
- Runtime schema migration workflow
- Scheduler stress baseline
- PostgreSQL backup and restore runbook
- Open source onboarding guide (ZH)
- Observability assets (Grafana + alerts)
Execution runtime namespaces:
oris_runtime::execution_runtime— graph-agnostic control-plane types, repositories, scheduler, and API contract models.oris_runtime::execution_server— graph-aware HTTP server and benchmark helpers such asbuild_routerandExecutionApiState.oris-execution-server— package-level facade for the graph-aware execution server surface; use this crate if you want a dedicated dependency for the HTTP layer.- Legacy graph-aware re-exports from
oris_runtime::execution_runtimeandoris_runtime::kernelstill compile, but they are deprecated compatibility shims.
Start the execution server:
Default address: 127.0.0.1:8080 (ORIS_SERVER_ADDR to override)
Default SQLite db path: oris_execution_server.db (ORIS_SQLITE_DB to override)
Runtime backend selector: ORIS_RUNTIME_BACKEND (sqlite default; postgres requires kernel-postgres feature)
Postgres DSN/schema: ORIS_POSTGRES_DSN (or ORIS_RUNTIME_DSN), ORIS_POSTGRES_SCHEMA (default public)
Postgres schema strictness: ORIS_POSTGRES_REQUIRE_SCHEMA (default true, startup fails if schema is missing)
Optional auth secrets: ORIS_API_AUTH_BEARER_TOKEN, ORIS_API_AUTH_API_KEY
Optional keyed API key id: ORIS_API_AUTH_API_KEY_ID (use with ORIS_API_AUTH_API_KEY)
Bad backend config/health now fails startup with actionable error and non-zero exit.
When ORIS_API_AUTH_API_KEY_ID is set with SQLite persistence, the key record is persisted in runtime_api_keys.
RBAC baseline: admin can access all APIs; operator can access /v1/jobs*, /v1/interrupts*, /v1/dlq*, GET /v1/audit/logs, and GET /v1/attempts/:attempt_id/retries; worker can access /v1/workers*.
Audit API:
GET /v1/audit/logs— list control-plane audit logs (query:request_id,action,from_ms,to_ms,limit)
Attempt retry API:
GET /v1/attempts/:attempt_id/retries— inspect retry scheduling history for an attempt
Dead-letter queue API:
GET /v1/dlq— list dead-lettered attempts (query:status,limit)GET /v1/dlq/:attempt_id— inspect a dead-lettered attemptPOST /v1/dlq/:attempt_id/replay— requeue a dead-lettered attempt for another dispatch cycle
Execution server endpoints (v1 runtime-bin):
- Canonical machine-readable contract: docs/runtime-api-contract.json
- Regenerate after API changes:
bash scripts/update_runtime_api_contract.sh - Benchmark policy and baseline: docs/runtime-benchmark-policy.md
GET /metrics— Prometheus scrape endpoint for runtime metrics (queue_depth,dispatch_latency_ms,lease_conflict_rate,recovery_latency_ms)POST /v1/jobs/runOptional request fields:timeout_policywith{ "timeout_ms": <positive>, "on_timeout_status": "failed"|"cancelled" },priority(0..100, higher dispatches first), andtenant_id(stable throttling key). Optional header:traceparent(00-<trace_id>-<span_id>-<flags>) to continue an upstream W3C/OpenTelemetry trace; responses returndata.trace.GET /v1/jobs— list jobs (query:status,limit,offset)GET /v1/jobs/:thread_idGET /v1/jobs/:thread_id/detail— run drill-down (status, attempts, checkpoint, pending interrupt)GET /v1/jobs/:thread_id/timeline/export— export timeline as JSON for auditGET /v1/jobs/:thread_id/historyGET /v1/jobs/:thread_id/timelineGET /v1/jobs/:thread_id/checkpoints/:checkpoint_idPOST /v1/jobs/:thread_id/resumePOST /v1/jobs/:thread_id/replay— withsqlite-persistence, replay requests are fingerprinted by thread + replay target (checkpoint_idwhen present, otherwise current state fingerprint) and duplicate replays return the stored response instead of re-executing side effectsPOST /v1/jobs/:thread_id/cancel
Interrupt API (Phase 4):
GET /v1/interrupts— list pending interrupts (query:status,run_id,limit)GET /v1/interrupts/:interrupt_id— get interrupt detailPOST /v1/interrupts/:interrupt_id/resume— resume with value (delegates to job resume)POST /v1/interrupts/:interrupt_id/reject— reject/cancel interrupt (marks run cancelled)
Worker endpoints (Phase 3 baseline):
POST /v1/workers/pollOptional request field:tenant_max_active_leasesto cap concurrent active leases per tenant during dispatch; traced attempts returndata.trace.POST /v1/workers/:worker_id/heartbeat— returnsdata.tracewhen the lease belongs to a traced attemptPOST /v1/workers/:worker_id/extend-leasePOST /v1/workers/:worker_id/report-step— returnsdata.tracewhen the attempt has trace contextPOST /v1/workers/:worker_id/ack— accepts optionalretry_policy(fixedorexponential) on failed ack to schedule bounded retries, and returnsdata.tracewhen the attempt has trace context
Lease/failover/backpressure baseline behavior:
pollfirst runs a lease-expiry tick (expire_leases_and_requeue) before dispatching.- The same tick also transitions attempts that exceeded
started_at + timeout_msinto their configured terminal status (failedorcancelled) before any requeue/dispatch. - Under mixed queues, dispatch prefers higher
prioritybefore falling back to attempt order. pollenforces both per-worker and per-tenant active lease limits, returningdecision=backpressurewithreasonand active-limit counters when throttled.pollenforces per-worker active-lease guardrail viamax_active_leases(request) or server default.pollreturnsdecisionasdispatched,noop, orbackpressure.heartbeat/extend-leaseenforce lease ownership (worker_idmust match lease owner), otherwise409 conflict.- Expired leases are requeued automatically and become dispatchable again on subsequent polls.
ackmarks terminal attempt status (completed/failed/cancelled); failed ack can optionally schedule retry backoff and returnsretry_scheduledwithnext_retry_at.- Final failed attempts (including timeout-to-
failed) are persisted into the DLQ and can be replayed through/v1/dlq/:attempt_id/replay.
Run idempotency contract (POST /v1/jobs/run):
- Send optional
idempotency_key. - Same
idempotency_key+ same payload returns the stored semantic result withdata.idempotent_replay=true. - Same replay target (
thread_id+ explicitcheckpoint_id, orthread_id+ current state fingerprint) is also deduplicated undersqlite-persistence; repeated replay calls return the stored response withdata.idempotent_replay=true. - Same
idempotency_key+ different payload returns409 conflict. - Trace metadata is observational only and does not participate in idempotency matching.
Prometheus metrics contract:
oris_runtime_queue_depth— current dispatchable queue depth gaugeoris_runtime_dispatch_latency_ms— dispatch latency histogramoris_runtime_lease_operations_total/oris_runtime_lease_conflicts_total— lease operation and conflict countersoris_runtime_lease_conflict_rate— derived conflict-rate gaugeoris_runtime_backpressure_total{reason="worker_limit|tenant_limit"}— backpressure counter by causeoris_runtime_terminal_acks_total{status="completed|failed|cancelled"}— terminal worker ack countersoris_runtime_terminal_error_rate— derived terminal error-rate gaugeoris_runtime_recovery_latency_ms— failover recovery latency histogram
Prebuilt observability assets:
- Grafana dashboard:
docs/observability/runtime-dashboard.json - Prometheus alert rules:
docs/observability/prometheus-alert-rules.yml - Sample validation scrape:
docs/observability/sample-runtime-workload.prom
Execution API error contract:
- Error shape:
request_id: correlation id (propagatesx-request-idwhen provided)error.code: stable machine code (invalid_argument,unauthorized,forbidden,not_found,conflict,internal)error.message: human-readable summaryerror.details: optional structured context
Example:
Compatibility notes:
- Existing
request_idanddatafields in successful responses are preserved. - Success envelopes now include
meta(status,api_version) as additive fields.
API documentation · Examples directory
License and attribution
MIT. This project includes code derived from langchain-rust; see LICENSE.
Community and policies
- Contribution guide: CONTRIBUTING.md
- Code of conduct: CODE_OF_CONDUCT.md
- Security policy: SECURITY.md
- Privacy notice: PRIVACY.md
- Support guide: SUPPORT.md
- Governance: GOVERNANCE.md