orign 0.2.3 - Docs.rs

Globe-scale Agentic Alignment

Orign makes it simple to train and deploy robust AI agents that can learn from human feedback. It further provides mechanisms for agents to learn interactively and autonomously.

Built on the nebulous runtime, Orign components can be ran on any cloud, and can easily connect across clouds and regions.

Ships as a single binary, performant and lightweight via Rust :crab:

It takes a team to align models, we connect them globally :earth_americas:

[!WARNING] Orign is in alpha, things may break.

Installation

Python

pip install orign

CLI

curl -fsSL -H "Cache-Control: no-cache" https://storage.googleapis.com/orign/releases/install.sh | bash

Usage

Start an orign server

orign serve --docker

Or optionally run on Kubernetes with our helm chart

Replay Buffer

Create a replay buffer which will store the agent experience and launch training jobs.

In this example, once the buffer has 50 examples, it will randomly sample 100 examples and launch a TRL training job on runpod with 1 A100 GPU.

from orign import ReplayBuffer, ContainerRequest

buffer = ReplayBuffer(
    name="sql-adapter",
    train_every=50,
    sample_n=100,
    sample_strategy="Random",
    train_job=ContainerRequest(
        image="huggingface/trl-latest-gpu:latest",
        command="trl sft --model_name_or_path $MODEL --dataset_name $DATASET_PATH ...",
        platform="runpod",
        env={
            "MODEL": "Qwen/Qwen2.5-7B-Instruct",
        }
        accelerators=["1:A100"],
    )
)

Orign sets the following env vars in your container when it launches, based on the buffer config:

DATASET_URI
DATASET_PATH
NUM_EPOCHS

For simplicity, Orign also supplies high level framework specific training containers.

from orign import TRL

training_job = TRL(
    model="Qwen/Qwen2.5-7B-Instruct",
    platform="runpod",
    accelerators=["1:H200_SXM"],
)

buffer = ReplayBuffer(
    ...
    train_job=training_job,
)

Send data to the replay buffer

buffer.send(data)

See a list of all replay buffers

orign get buffers

Online LLM

Create an online LLM which is capable of both training and inference.

In this example, the actor will use a vLLM server running on EC2 with 2 H100 GPUs, and the buffer we previously created.

from orign import OnlineLLM, Container

actor = OnlineLLM(
    name="sql-actor",
    buffer=buffer,
    server=Container(
        image="vllm/vllm-openai:latest",
        command="python3 -m vllm.entrypoints.openai.api_server --model $MODEL ...",
        platform="ec2",
        env={
            "MODEL": "Qwen/Qwen2.5-7B-Instruct",
        },
        accelerators=["2:H100_SXM"],
    )
)

For simplicity, Orign also supplies high level framework specific serving containers.

from orign import VLLM

server = VLLM(
    model="Qwen/Qwen2.5-7B-Instruct",
    platform="ec2",
    accelerators=["2:H100_SXM"],
)

actor = OnlineLLM(
    ...
    server=server,
)

Use the LLM to generate responses.

messages = [
    {"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
]
response = actor.chat(messages)
print(response)

Send the LLM training examples

messages = [
    {"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
    {"role": "assistant", "content": "sql\nSELECT * FROM users WHERE join_date > '2023-01-01';\n"},
]
actor.learn(messages)

Replay buffers will automatically launch training jobs when they hit the train_every threshold. However you can launch them manually.

actor.train()

Orign also supplies high level objects for common online LLMs.

from orign import Gemma3

actor = Gemma3(
    model="google/gemma3-3b-instruct",
    platform="ec2",
    accelerators=["1:A100_SXM"],
    lora=True,
)

It's also easy to create your own online LLM wrapper.

Human

Connect to a human which is capable of providing feedback to the agent.

In this example, we collect feedback from humans in a slack channel.

from orign import Human

human = Human(
    name="sql-adapter-annotator",
    medium="slack",
    channel="#agent-training",
)

Use the human to provide feedback to the agent.

messages = [
    {"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
    {"role": "assistant", "content": "sql\nSELECT * FROM users WHERE join_date > '2023-01-01';\n"},
]
human.feedback(messages)

from orign import container

@container(image="python:3.10")
def on_feedback(feedback):
    print(feedback)

human.on_feedback(on_feedback)

Verifiers and Autonomous Learning

As a more complex example, use the feedback to train both the agent and a verifier, enabling autonomous learning.

First, lets create a verifier using an online LLM.

from orign import Qwen2_5

verifier = Qwen2_5(
    name="sql-adapter-verifier",
    model="Qwen/Qwen2.5-7B-Instruct",
    platform="ec2",
    accelerators=["1:H100_SXM"],
)

Now, lets create a container that will launch when the human provides feedback.

@container(image="agentsea/orign-py:latest")
def on_feedback(feedback):
    from orign import ReplayBuffer

    # Get the buffers we previously created for our actor and verifier.
    actor_buffer = ReplayBuffer.get("sql-adapter-actor")
    verifier_buffer = ReplayBuffer.get("sql-adapter-verifier")

    # Teach the verifier to judge whether the assistant's response is correct.
    verifier_messages = [
        {"role": "user", "content": f"Given the conversation {feedback.messages}, please judge whether the assistant's response is correct."},
        {"role": "assistant", "content": feedback.correct},
    ]    
    verifier_buffer.send(verifier_messages)

    # If the assistant's response is correct, train the actor.
    if feedback.correct:
        actor_buffer.send(feedback.messages)

# Register the callback
human.on_feedback(on_feedback)

Using the previous example, once the verifier is trained, we can use it to train the actor autonomously.

while True:
    # implement this function however makes sense for you
    task = next_task()
    response = actor.chat(task)

    # implement this function to format the chat history for the verifier
    verifier_messages = get_verifier_messages(task, response)
    feedback = verifier.chat(verifier_messages)

    if feedback.correct:
        actor.learn(feedback.messages)

Roadmap

MCP support
Metrics
More human backends

Contributing

Please open an issue or submit a PR.

Inspiration

OpenRLHF
AlignAnything
TRL
Nebulous