A globally distributed container orchestrator
Think of it as a Kubernetes that can span clouds and regions with a focus on accelerated compute. Ships as a single binary, performant and lightweight via Rust :crab:
Why not Kubernetes? See why_not_kube.md
[!WARNING] Nebulous is in alpha, things may break.
Installation
|
[!NOTE] Only MacOS and Linux arm64/amd64 are supported at this time.
Usage
Export the keys of your cloud providers.
Run a local API server on docker
Or optionally run on Kubernetes with our helm chart
Connect to the tailnet
See what cloud platforms are currently supported.
[!TIP] Prefer a pythonic interface? Try nebulous-py Prefer a higher level LLM interface? Try orign
Containers
Let's run our first container. We'll create a container on runpod with 2 A100 GPUs which trains a model using TRL.
First, let's find what accelerators are available.
Now lets create a container.
kind: Container
metadata:
name: trl-job
namespace: training
image: "huggingface/trl-latest-gpu:latest"
platform: runpod
command: |
source activate trl && trl sft --model_name_or_path $MODEL \
--dataset_name $DATASET \
--output_dir /output \
--torch_dtype bfloat16 \
--use_peft true
env:
- key: MODEL
value: Qwen/Qwen2.5-7B
- key: DATASET
value: trl-lib/Capybara
volumes:
- source: /output
dest: s3://<my-bucket>/training-output
driver: RCLONE_COPY
continuous: true
accelerators:
- "2:A100_SXM"
restart: Never
Replace <my-bucket> with a bucket name your aws credentials have access to, and edit any other fields as needed.
[!TIP] See our container examples for more.
List all containers
Get the container we just created.
Exec a command in a container
nebu exec trl-job -n training -c "echo hello"
Get logs from a container
Send an http request to a container
Queues
Containers can be assigned to a FIFO queue, which will block them from starting until the queue is free.
kind: Container
image: pytorch/pytorch:latest
queue: actor-critic-training
Volumes
Volumes provide a means to persist and sync data accross clouds. Nebulous uses rclone to sync data between clouds backed by an object storage provider.
volumes:
- source: s3://nebulous-rs/test
dest: /test
driver: RCLONE_SYNC
continuous: true
Supported drivers are:
RCLONE_SYNCRCLONE_COPYRCLONE_BISYNC
Organizations
Nebulous is multi-tenant from the ground up. Here is an example of creating a container under the agentsea organization.
The authorization heirarchy is
orgs -> namespaces -> resources
Meters
Metered billing is supported through OpenMeter using the meters field.
meters:
- cost: 0.1
unit: second
currency: USD
metric: runtime
Cost plus is supported through the costp field.
meters:
- costp: 10
unit: second
currency: USD
metric: runtime
This configuration will add 10% to the cost of the container.
Authz
Authz is supported through the container proxy.
To enable the proxy for a container, set the proxy_port field to the container port you want to proxy.
proxy_port: 8080
Then your service can be accesssed at http://proxy.<nebu-host> with the header x-resource: <name>.<namespace>.<kind>.
With the proxy enabled, you can also configure authz rules.
authz:
rules:
# Match on email
- name: email-match
field_match:
- field: "owner"
pattern: "${email}"
allow: true
# Path-based matching for organization resources
- name: org-path-access
path_match:
- pattern: "/api/v1/orgs/${org_id}/**"
- pattern: "/api/v1/organizations/${org_id}/**"
- pattern: "/api/v1/models/${org_id}/**"
allow: true
Variables are interpolated from the users auth profile.
[!TIP] See container examples for more.
Secrets
Secrets are used to store sensitive information such as API keys and credentials. Secrets are AES-256 encrypted and stored in the database.
Create a secret
Get all secrets
Get a secret
Delete a secret
Secrets can be used in container environment variables.
kind: Container
metadata:
name: my-container
namespace: my-app
env:
- key: MY_SECRET
secret_name: my-secret
Namespaces
Namespaces provide a means to segment groups of resources across clouds.
kind: Container
metadata:
name: llama-factory-server
namespace: my-app
Resources within a given namespace are network isolated using Tailnet, and can be accessed by simply using http://{kind}-{id} e.g. http://container-12345:8000.
Nebulous cloud provides a free hosted HeadScale instance to connect your resources, or you can bring your own by simply setting the TAILSCALE_URL environment variable.
Services [in progress]
Services provide a means to expose containers on a stable IP address, and to balance traffic across multiple containers. Services auto-scale up and down as needed.
kind: Service
metadata:
name: vllm-qwen
namespace: inference
container:
image: vllm/vllm-openai:latest
command: |
python -m vllm.entrypoints.api_server \
--model Qwen/Qwen2-7B-Instruct \
--tensor-parallel-size 1 \
--port 8000
accelerators:
- "1:A100"
platform: gce
min_containers: 1
max_containers: 5
scale:
up:
above_latency: 100ms
duration: 10s
down:
below_latency: 10ms
duration: 5m
zero:
below_latency: 10ms
duration: 10m
The IP will be returned in the status field.
Service can be buffered, which will queue requests until a container is available.
buffered: true
Services can also scale to zero.
min_containers: 0
Services can also enforce schemas.
schema:
- name: prompt
type: string
required: true
Or use a common schema.
common_schema: OPENAI_CHAT
Services can record all requests and responses.
record: true
Services can perform metered billing, such as counting the number of tokens in the response.
meters:
- cost: 0.001
unit: token
currency: USD
response_json_value: "$.usage.prompt_tokens"
Services also work with clusters.
kind: Service
metadata:
name: vllm-qwen
namespace: inference
cluster:
container:
image: vllm/vllm-openai:latest
command: |
python -m vllm.entrypoints.api_server \
--model Qwen/Qwen2-72B-Instruct \
--tensor-parallel-size 1 \
--port 8000
accelerators:
- "8:A100"
num_nodes: 2
[!TIP] See service examples for more.
Clusters [in progress]
Clusters provide a means of multi-node training and inference.
kind: Cluster
metadata:
name: pytorch-test
namespace: foo
container:
image: pytorch/pytorch:latest
command: "echo $NODES && torchrun ..."
platform: ec2
env:
- key: HELLO
value: world
volumes:
- source: s3://nebulous-rs/test
dest: /test
driver: RCLONE_SYNC
continuous: true
accelerators:
- "8:B200"
num_nodes: 4
Each container will get a $NODES env var which contains the IP addresses of the nodes in the cluster.
Clusters always aim to schedule nodes as close to each other as possible, with as fast of networking as available.
[!TIP] See cluster examples for more.
Processors [in progress]
Processors are containers that work off real-time data streams and are autoscaled based on back-pressure. Streams are provided by Redis Streams.
Processors are best used for bursty async jobs, or low latency stream processing.
kind: Processor
metadata:
name: translator
namespace: my-app
stream: my-app:workers:translator
container:
image: corge/translator:latest
command: "redis-cli XREAD COUNT 10 STREAMS my-app:workers:translator"
platform: gce
accelerators:
- "1:A40"
min_workers: 1
max_workers: 10
scale:
up:
above_pressure: 100
duration: 10s
down:
below_pressure: 10
duration: 5m
zero:
duration: 10m
Processors can also scale to zero.
min_workers: 0
Processors can enforce schemas.
schema:
- name: text_to_translate
type: string
required: true
Send data to a processor stream
Read data from a processor stream
nebu read processor translator --num 10
List all processors
Processors can use containers across different platforms. [in progress]
container:
image: corge/translator:latest
command: "redis-cli XREAD COUNT 10 STREAMS my-app:workers:translator"
platforms:
- gce
- runpod
accelerators:
- "1:A40"
[!TIP] See processor examples for more.
SDK
:snake: Python https://github.com/agentsea/nebulous-py
:crab: Rust https://crates.io/crates/nebulous/versions
Roadmap
- Support non-gpu containers
- Services
- Clusters
- Processors
- Support for AWS EC2
- Support for GCE
- Support for Azure
- Support for Kubernetes
Contributing
Please open an issue or submit a PR.
Developing
Add all the environment variables shown in the .env_ file to your environment.
Run a postgres and redis instance locally. This can be done easily with docker.
To configure the secrets store you will need an encryption key. This can be generated with the following command.
| |
Then set this to the NEBU_ENCRYPTION_KEY environment variable.
To optionally use OpenMeter for metered billing, you will need to open an account with either their cloud or run their open source and set the OPENMETER_API_KEY and OPENMETER_URL environment variables.
To optionally use Tailnet, you will need to open an account with Tailscale or run your own HeadScale instance and set the TAILSCALE_API_KEY and TAILSCALE_TAILNET environment variables.
Install locally
make install
Run the server
nebu serve
Login to the auth server. When you do, set the server to http://localhost:3000.
nebu login
Now you can create resources
When you make changes, simply run make install and nebu serve again.