orign 0.2.3

A globally distributed container orchestrator
Documentation
<p align="center">
  <img src="./static/orign_logo6_alpha.png" alt="Orign Logo" width="400">
</p>


__Globe-scale Agentic Alignment__
   
Orign makes it simple to train and deploy __robust__ AI agents that can learn from human feedback. It further provides mechanisms for agents to learn __interactively__ and __autonomously__.

Built on the [nebulous runtime](https://github.com/agentsea/nebulous), Orign components can be ran on __any__ cloud, and can easily connect across clouds and regions.

Ships as a single binary, performant and lightweight via Rust :crab:   
   
It takes a team to align models, we connect them globally :earth_americas:

> [!WARNING]
> Orign is in __alpha__, things may break.

## Installation
Python
```sh
pip install orign
```

CLI
```sh
curl -fsSL -H "Cache-Control: no-cache" https://storage.googleapis.com/orign/releases/install.sh | bash
```

## Usage

Start an orign server
```sh
orign serve --docker
```

Or optionally run on Kubernetes with our [helm chart](./deploy/charts/orign/)   

### Replay Buffer

Create a replay buffer which will store the agent experience and launch training jobs.   
   
In this example, once the buffer has 50 examples, it will randomly sample 100 examples and launch a TRL training job on runpod with 1 A100 GPU.

```python
from orign import ReplayBuffer, ContainerRequest

buffer = ReplayBuffer(
    name="sql-adapter",
    train_every=50,
    sample_n=100,
    sample_strategy="Random",
    train_job=ContainerRequest(
        image="huggingface/trl-latest-gpu:latest",
        command="trl sft --model_name_or_path $MODEL --dataset_name $DATASET_PATH ...",
        platform="runpod",
        env={
            "MODEL": "Qwen/Qwen2.5-7B-Instruct",
        }
        accelerators=["1:A100"],
    )
)
```
Orign sets the following env vars in your container when it launches, based on the buffer config:
- `DATASET_URI`
- `DATASET_PATH`
- `NUM_EPOCHS`

For simplicity, Orign also supplies high level framework specific training containers.

```python
from orign import TRL

training_job = TRL(
    model="Qwen/Qwen2.5-7B-Instruct",
    platform="runpod",
    accelerators=["1:H200_SXM"],
)

buffer = ReplayBuffer(
    ...
    train_job=training_job,
)
```

Send data to the replay buffer
```python
buffer.send(data)
```

See a list of all replay buffers
```sh
orign get buffers
```

### Online LLM

Create an online LLM which is capable of both training and inference.    
   
In this example, the actor will use a vLLM server running on EC2 with 2 H100 GPUs, and the buffer we previously created.

```python
from orign import OnlineLLM, Container

actor = OnlineLLM(
    name="sql-actor",
    buffer=buffer,
    server=Container(
        image="vllm/vllm-openai:latest",
        command="python3 -m vllm.entrypoints.openai.api_server --model $MODEL ...",
        platform="ec2",
        env={
            "MODEL": "Qwen/Qwen2.5-7B-Instruct",
        },
        accelerators=["2:H100_SXM"],
    )
)
```

For simplicity, Orign also supplies high level framework specific serving containers.

```python
from orign import VLLM

server = VLLM(
    model="Qwen/Qwen2.5-7B-Instruct",
    platform="ec2",
    accelerators=["2:H100_SXM"],
)

actor = OnlineLLM(
    ...
    server=server,
)
```

Use the LLM to generate responses.

```python
messages = [
    {"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
]
response = actor.chat(messages)
print(response)
```

Send the LLM training examples

```python
messages = [
    {"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
    {"role": "assistant", "content": "sql\nSELECT * FROM users WHERE join_date > '2023-01-01';\n"},
]
actor.learn(messages)
```

Replay buffers will automatically launch training jobs when they hit the `train_every` threshold. However you can launch them manually.

```python
actor.train()
```

Orign also supplies high level objects for common online LLMs.

```python
from orign import Gemma3

actor = Gemma3(
    model="google/gemma3-3b-instruct",
    platform="ec2",
    accelerators=["1:A100_SXM"],
    lora=True,
)
```
It's also easy to [create your own online LLM wrapper](examples/llms/wrapper.py).

### Human

Connect to a human which is capable of providing feedback to the agent.   
   
In this example, we collect feedback from humans in a slack channel.

```python
from orign import Human

human = Human(
    name="sql-adapter-annotator",
    medium="slack",
    channel="#agent-training",
)
```

Use the human to provide feedback to the agent.

```python
messages = [
    {"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
    {"role": "assistant", "content": "sql\nSELECT * FROM users WHERE join_date > '2023-01-01';\n"},
]
human.feedback(messages)
```

Register a callback to run a container when the human provides feedback.

```python
from orign import container

@container(image="python:3.10")
def on_feedback(feedback):
    print(feedback)

human.on_feedback(on_feedback)
```

#### Verifiers and Autonomous Learning

As a more complex example, use the feedback to train both the agent and a verifier, enabling autonomous learning.
   
First, lets create a verifier using an online LLM.
```python
from orign import Qwen2_5

verifier = Qwen2_5(
    name="sql-adapter-verifier",
    model="Qwen/Qwen2.5-7B-Instruct",
    platform="ec2",
    accelerators=["1:H100_SXM"],
)
```

Now, lets create a container that will launch when the human provides feedback.

```python
@container(image="agentsea/orign-py:latest")
def on_feedback(feedback):
    from orign import ReplayBuffer

    # Get the buffers we previously created for our actor and verifier.
    actor_buffer = ReplayBuffer.get("sql-adapter-actor")
    verifier_buffer = ReplayBuffer.get("sql-adapter-verifier")

    # Teach the verifier to judge whether the assistant's response is correct.
    verifier_messages = [
        {"role": "user", "content": f"Given the conversation {feedback.messages}, please judge whether the assistant's response is correct."},
        {"role": "assistant", "content": feedback.correct},
    ]    
    verifier_buffer.send(verifier_messages)

    # If the assistant's response is correct, train the actor.
    if feedback.correct:
        actor_buffer.send(feedback.messages)

# Register the callback
human.on_feedback(on_feedback)
```

Using the previous example, once the verifier is trained, we can use it to train the actor __autonomously__.

```python
while True:
    # implement this function however makes sense for you
    task = next_task()
    response = actor.chat(task)

    # implement this function to format the chat history for the verifier
    verifier_messages = get_verifier_messages(task, response)
    feedback = verifier.chat(verifier_messages)

    if feedback.correct:
        actor.learn(feedback.messages)
```

## Roadmap

- [ ] MCP support
- [ ] Metrics
- [ ] More human backends

## Contributing

Please open an issue or submit a PR.

## Inspiration

- OpenRLHF
- AlignAnything
- TRL
- Nebulous