Globe-scale Agentic Alignment
Orign makes it simple to train and deploy robust AI agents that can learn from human feedback. It further provides mechanisms for agents to learn interactively and autonomously.
Built on the nebulous runtime, Orign components can be ran on any cloud, and can easily connect across clouds and regions.
Ships as a single binary, performant and lightweight via Rust :crab:
It takes a team to align models, we connect them globally :earth_americas:
[!WARNING] Orign is in alpha, things may break.
Installation
Python
CLI
|
Usage
Start an orign server
Or optionally run on Kubernetes with our helm chart
Replay Buffer
Create a replay buffer which will store the agent experience and launch training jobs.
In this example, once the buffer has 50 examples, it will randomly sample 100 examples and launch a TRL training job on runpod with 1 A100 GPU.
=
Orign sets the following env vars in your container when it launches, based on the buffer config:
DATASET_URIDATASET_PATHNUM_EPOCHS
For simplicity, Orign also supplies high level framework specific training containers.
=
=
Send data to the replay buffer
See a list of all replay buffers
Online LLM
Create an online LLM which is capable of both training and inference.
In this example, the actor will use a vLLM server running on EC2 with 2 H100 GPUs, and the buffer we previously created.
=
For simplicity, Orign also supplies high level framework specific serving containers.
=
=
Use the LLM to generate responses.
=
=
Send the LLM training examples
=
Replay buffers will automatically launch training jobs when they hit the train_every threshold. However you can launch them manually.
Orign also supplies high level objects for common online LLMs.
=
It's also easy to create your own online LLM wrapper.
Human
Connect to a human which is capable of providing feedback to the agent.
In this example, we collect feedback from humans in a slack channel.
=
Use the human to provide feedback to the agent.
=
Register a callback to run a container when the human provides feedback.
Verifiers and Autonomous Learning
As a more complex example, use the feedback to train both the agent and a verifier, enabling autonomous learning.
First, lets create a verifier using an online LLM.
=
Now, lets create a container that will launch when the human provides feedback.
# Get the buffers we previously created for our actor and verifier.
=
=
# Teach the verifier to judge whether the assistant's response is correct.
=
# If the assistant's response is correct, train the actor.
# Register the callback
Using the previous example, once the verifier is trained, we can use it to train the actor autonomously.
# implement this function however makes sense for you
=
=
# implement this function to format the chat history for the verifier
=
=
Roadmap
- MCP support
- Metrics
- More human backends
Contributing
Please open an issue or submit a PR.
Inspiration
- OpenRLHF
- AlignAnything
- TRL
- Nebulous