<p align="center">
<img src="./static/orign_logo6_alpha.png" alt="Orign Logo" width="400">
</p>
__Globe-scale Agentic Alignment__
Orign makes it simple to train and deploy __robust__ AI agents that can learn from human feedback. It further provides mechanisms for agents to learn __interactively__ and __autonomously__.
Built on the [nebulous runtime](https://github.com/agentsea/nebulous), Orign components can be ran on __any__ cloud, and can easily connect across clouds and regions.
Ships as a single binary, performant and lightweight via Rust :crab:
It takes a team to align models, we connect them globally :earth_americas:
> [!WARNING]
> Orign is in __alpha__, things may break.
## Installation
Python
```sh
pip install orign
```
CLI
```sh
## Usage
Start an orign server
```sh
orign serve --docker
```
Or optionally run on Kubernetes with our [helm chart](./deploy/charts/orign/)
### Replay Buffer
Create a replay buffer which will store the agent experience and launch training jobs.
In this example, once the buffer has 50 examples, it will randomly sample 100 examples and launch a TRL training job on runpod with 1 A100 GPU.
```python
from orign import ReplayBuffer, ContainerRequest
buffer = ReplayBuffer(
name="sql-adapter",
train_every=50,
sample_n=100,
sample_strategy="Random",
train_job=ContainerRequest(
image="huggingface/trl-latest-gpu:latest",
command="trl sft --model_name_or_path $MODEL --dataset_name $DATASET_PATH ...",
platform="runpod",
env={
"MODEL": "Qwen/Qwen2.5-7B-Instruct",
}
accelerators=["1:A100"],
)
)
```
Orign sets the following env vars in your container when it launches, based on the buffer config:
- `DATASET_URI`
- `DATASET_PATH`
- `NUM_EPOCHS`
For simplicity, Orign also supplies high level framework specific training containers.
```python
from orign import TRL
training_job = TRL(
model="Qwen/Qwen2.5-7B-Instruct",
platform="runpod",
accelerators=["1:H200_SXM"],
)
buffer = ReplayBuffer(
...
train_job=training_job,
)
```
Send data to the replay buffer
```python
buffer.send(data)
```
See a list of all replay buffers
```sh
orign get buffers
```
### Online LLM
Create an online LLM which is capable of both training and inference.
In this example, the actor will use a vLLM server running on EC2 with 2 H100 GPUs, and the buffer we previously created.
```python
from orign import OnlineLLM, Container
actor = OnlineLLM(
name="sql-actor",
buffer=buffer,
server=Container(
image="vllm/vllm-openai:latest",
command="python3 -m vllm.entrypoints.openai.api_server --model $MODEL ...",
platform="ec2",
env={
"MODEL": "Qwen/Qwen2.5-7B-Instruct",
},
accelerators=["2:H100_SXM"],
)
)
```
For simplicity, Orign also supplies high level framework specific serving containers.
```python
from orign import VLLM
server = VLLM(
model="Qwen/Qwen2.5-7B-Instruct",
platform="ec2",
accelerators=["2:H100_SXM"],
)
actor = OnlineLLM(
...
server=server,
)
```
Use the LLM to generate responses.
```python
messages = [
{"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
]
response = actor.chat(messages)
print(response)
```
Send the LLM training examples
```python
messages = [
{"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
{"role": "assistant", "content": "sql\nSELECT * FROM users WHERE join_date > '2023-01-01';\n"},
]
actor.learn(messages)
```
Replay buffers will automatically launch training jobs when they hit the `train_every` threshold. However you can launch them manually.
```python
actor.train()
```
Orign also supplies high level objects for common online LLMs.
```python
from orign import Gemma3
actor = Gemma3(
model="google/gemma3-3b-instruct",
platform="ec2",
accelerators=["1:A100_SXM"],
lora=True,
)
```
It's also easy to [create your own online LLM wrapper](examples/llms/wrapper.py).
### Human
Connect to a human which is capable of providing feedback to the agent.
In this example, we collect feedback from humans in a slack channel.
```python
from orign import Human
human = Human(
name="sql-adapter-annotator",
medium="slack",
channel="#agent-training",
)
```
Use the human to provide feedback to the agent.
```python
messages = [
{"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
{"role": "assistant", "content": "sql\nSELECT * FROM users WHERE join_date > '2023-01-01';\n"},
]
human.feedback(messages)
```
Register a callback to run a container when the human provides feedback.
```python
from orign import container
@container(image="python:3.10")
def on_feedback(feedback):
print(feedback)
human.on_feedback(on_feedback)
```
#### Verifiers and Autonomous Learning
As a more complex example, use the feedback to train both the agent and a verifier, enabling autonomous learning.
First, lets create a verifier using an online LLM.
```python
from orign import Qwen2_5
verifier = Qwen2_5(
name="sql-adapter-verifier",
model="Qwen/Qwen2.5-7B-Instruct",
platform="ec2",
accelerators=["1:H100_SXM"],
)
```
Now, lets create a container that will launch when the human provides feedback.
```python
@container(image="agentsea/orign-py:latest")
def on_feedback(feedback):
from orign import ReplayBuffer
# Get the buffers we previously created for our actor and verifier.
actor_buffer = ReplayBuffer.get("sql-adapter-actor")
verifier_buffer = ReplayBuffer.get("sql-adapter-verifier")
# Teach the verifier to judge whether the assistant's response is correct.
verifier_messages = [
{"role": "user", "content": f"Given the conversation {feedback.messages}, please judge whether the assistant's response is correct."},
{"role": "assistant", "content": feedback.correct},
]
verifier_buffer.send(verifier_messages)
# If the assistant's response is correct, train the actor.
if feedback.correct:
actor_buffer.send(feedback.messages)
# Register the callback
human.on_feedback(on_feedback)
```
Using the previous example, once the verifier is trained, we can use it to train the actor __autonomously__.
```python
while True:
# implement this function however makes sense for you
task = next_task()
response = actor.chat(task)
# implement this function to format the chat history for the verifier
verifier_messages = get_verifier_messages(task, response)
feedback = verifier.chat(verifier_messages)
if feedback.correct:
actor.learn(feedback.messages)
```
## Roadmap
- [ ] MCP support
- [ ] Metrics
- [ ] More human backends
## Contributing
Please open an issue or submit a PR.
## Inspiration
- OpenRLHF
- AlignAnything
- TRL
- Nebulous