Nebulous
A cross-cloud container orchestrator
Think of it as a Kubernetes that can span clouds with a focus on accelerated compute and AI workloads. Performant and lightweight via Rust.
Installation
|
Usage
Login to an API server
Create a container on runpod with 4 A100 GPUs
kind: Container
metadata:
name: pytorch-test
namespace: foo
image: pytorch/pytorch:latest
command: nvidia-smi
platform: runpod
env_vars:
- key: HELLO
value: world
volumes:
- source: s3://foo/bar
dest: /quz/baz
bidirectional: true
continuous: true
accelerators:
- "4:A100"
Create a container on EC2 with 1 L40s GPU
List all containers
Get one container
Delete a container
List available accelerators
List available platforms
Get the IP address of a container [in progress]
SSH into a container [in progress]
Volumes
Volumes provide a means to persist data accross clouds. Nebulous uses rclone to sync data between clouds backed by an object storage provider.
volumes:
- source: s3://foo/bar
dest: /quz/baz
bidirectional: true
continuous: true
Organizations
Nebulous is multi-tenant from the ground up. Here is an example of creating a container under the Agentsea organization.
Meters
Metered billing is supported through OpenMeter using the meters field.
meters:
- cost: 0.1
unit: second
currency: USD
metric: runtime
Clusters [in progress]
Clusters provide a means of multi-node training and inference.
kind: Cluster
metadata:
name: pytorch-test
namespace: foo
container:
image: pytorch/pytorch:latest
command: "echo $NODES && torchrun ..."
platform: runpod
env_vars:
- key: HELLO
value: world
volumes:
- source: s3://foo/bar
dest: /quz/baz
bidirectional: true
continuous: true
accelerators:
- "8:A100"
num_nodes: 4
Each container will get a $NODES env var which contains the IP addresses of the nodes in the cluster.
Clusters always aim to schedule nodes as close to each other as possible, with as fast of networking as available.
Services [in progress]
Services provide a means to expose containers on a stable IP address.
Namespaces [in progress]
Namespaces provide a means to segregate groups of resources across clouds. Resources within a given namespace are network isolated using Tailnet, and can be accessed by simply using thier name as the hostname e.g. http://foo:8080.
Contributing
Please open an issue or submit a PR.