Expand description
AWS EC2 deployer
Deploy a custom binary (and configuration) to any number of EC2 instances across multiple regions. View metrics and logs from all instances with Grafana.
§Features
- Automated creation, update, and destruction of EC2 instances across multiple regions
- Provide a unique name, instance type, region, binary, and configuration for each deployed instance
- Collect metrics, profiles (when enabled), and logs from all deployed instances on a long-lived monitoring instance (accessible only to the deployer’s IP)
§Architecture
Deployer's Machine (Public IP)
|
|
v
+-----------------------------------+
| Monitoring VPC (us-east-1) |
| - Monitoring Instance |
| - Prometheus |
| - Loki |
| - Pyroscope |
| - Tempo |
| - Grafana |
| - Security Group |
| - All: Deployer IP |
| - 3100: Binary VPCs |
| - 4040: Binary VPCs |
| - 4318: Binary VPCs |
+-----------------------------------+
^ ^
(Telemetry) (Telemetry)
| |
| |
+------------------------------+ +------------------------------+
| Binary VPC 1 | | Binary VPC 2 |
| - Binary Instance | | - Binary Instance |
| - Binary A | | - Binary B |
| - Promtail | | - Promtail |
| - Node Exporter | | - Node Exporter |
| - Pyroscope Agent | | - Pyroscope Agent |
| - Security Group | | - Security Group |
| - All: Deployer IP | | - All: Deployer IP |
| - 9090: Monitoring IP | | - 9090: Monitoring IP |
| - 9100: Monitoring IP | | - 9100: Monitoring IP |
| - 8012: 0.0.0.0/0 | | - 8765: 12.3.7.9/32 |
+------------------------------+ +------------------------------+§Instances
§Monitoring
- Deployed in
us-east-1with a configurable instance type (e.g.,t4g.smallfor ARM64,t3.smallfor x86_64) and storage (e.g., 10GB gp2). Architecture is auto-detected from the instance type. - Runs:
- Prometheus: Scrapes binary metrics from all instances at
:9090and system metrics from all instances at:9100. - Loki: Listens at
:3100, storing logs in/loki/chunkswith a TSDB index at/loki/index. - Pyroscope: Listens at
:4040, storing profiles in/var/lib/pyroscope. - Tempo: Listens at
:4318, storing traces in/var/lib/tempo. - Grafana: Hosted at
:3000, provisioned with Prometheus, Loki, and Tempo datasources and a custom dashboard.
- Prometheus: Scrapes binary metrics from all instances at
- Ingress:
- Allows deployer IP access (TCP 0-65535).
- Binary instance traffic to Loki (TCP 3100) and Tempo (TCP 4318).
§Binary
- Deployed in user-specified regions with configurable ARM64 or AMD64 instance types and storage.
- Run:
- Custom Binary: Executes with
--hosts=/home/ubuntu/hosts.yaml --config=/home/ubuntu/config.conf, exposing metrics at:9090. - Promtail: Forwards
/var/log/binary.logto Loki on the monitoring instance. - Node Exporter: Exposes system metrics at
:9100. - Pyroscope Agent: Forwards
perfprofiles to Pyroscope on the monitoring instance.
- Custom Binary: Executes with
- Ingress:
- Deployer IP access (TCP 0-65535).
- Monitoring IP access to
:9090and:9100for Prometheus. - User-defined ports from the configuration.
§Networking
§VPCs
One per region with CIDR 10.<region-index>.0.0/16 (e.g., 10.0.0.0/16 for us-east-1).
§Subnets
One subnet per availability zone that supports any required instance type in the region
(e.g., 10.<region-index>.<az-index>.0/24), linked to a shared route table with an internet gateway.
Each instance is placed in an AZ that supports its instance type, distributed round-robin across
eligible AZs, with automatic fallback to other AZs on capacity errors.
§VPC Peering
Connects the monitoring VPC to each binary VPC, with routes added to route tables for private communication.
§Security Groups
Separate for monitoring (tag) and binary instances ({tag}-binary), dynamically configured for deployer and inter-instance traffic.
§Workflow
§aws create
- Validates configuration and generates an SSH key pair, stored in
$HOME/.commonware_deployer/{tag}/id_rsa_{tag}. - Persists deployment metadata (tag, regions, instance names) to
$HOME/.commonware_deployer/{tag}/metadata.yaml. This enablesdestroy --tagcleanup if creation fails. - Ensures the shared S3 bucket exists and caches observability tools (Prometheus, Grafana, Loki, etc.) if not already present.
- Uploads deployment-specific files (binaries, configs) to S3.
- Creates VPCs, subnets, internet gateways, route tables, and security groups per region (concurrently).
- Establishes VPC peering between the monitoring region and binary regions.
- Launches the monitoring instance.
- Launches binary instances.
- Caches all static config files and uploads per-instance configs (hosts.yaml, promtail, pyroscope) to S3.
- Configures monitoring and binary instances in parallel via SSH (BBR, service installation, service startup).
- Updates the monitoring security group to allow telemetry traffic from binary instances.
- Marks completion with
$HOME/.commonware_deployer/{tag}/created.
§aws update
Performs rolling updates across all binary instances:
- Uploads the latest binary and configuration to S3.
- For each instance (up to
--concurrencyat a time, default 128): a. Stops thebinaryservice. b. Downloads the updated files from S3 via pre-signed URLs. c. Restarts thebinaryservice. d. Waits for the service to become active before proceeding.
Use --concurrency 1 for fully sequential updates that wait for each instance to be healthy
before updating the next.
§aws authorize
- Obtains the deployer’s current public IP address (or parses the one provided).
- For each security group in the deployment, adds an ingress rule for the IP (if it doesn’t already exist).
§aws destroy
Can be invoked with either --config <path> or --tag <tag>. When using --tag, the command
reads regions from the persisted metadata.yaml file, allowing destruction without the original
config file.
- Terminates all instances across regions.
- Deletes security groups, subnets, route tables, VPC peering connections, internet gateways, key pairs, and VPCs in dependency order.
- Deletes deployment-specific data from S3 (cached tools remain for future deployments).
- Marks destruction with
$HOME/.commonware_deployer/{tag}/destroyed, retaining the directory to prevent tag reuse.
§aws clean
- Deletes the shared S3 bucket and all its contents (cached tools and any remaining deployment data).
- Use this to fully clean up when you no longer need the deployer cache.
§aws list
Lists all active deployments (created but not destroyed). For each deployment, displays the tag, creation timestamp, regions, and number of instances.
§aws profile
- Loads the deployment configuration and locates the specified instance.
- Caches the samply binary in S3 if not already present.
- SSHes to the instance, downloads samply, and records a CPU profile of the running binary for the specified duration.
- Downloads the profile locally via SCP.
- Opens Firefox Profiler with symbols resolved from your local debug binary.
§Profiling
The deployer supports two profiling modes:
§Continuous Profiling (Pyroscope)
Enable continuous CPU profiling by setting profiling: true in your instance config. This runs
Pyroscope in the background, continuously collecting profiles that are viewable in the Grafana
dashboard on the monitoring instance.
For best results, build and deploy your binary with debug symbols and frame pointers:
CARGO_PROFILE_RELEASE_DEBUG=true RUSTFLAGS="-C force-frame-pointers=yes" cargo build --release§On-Demand Profiling (samply)
To generate an on-demand CPU profile (viewable in the Firefox Profiler UI), run the following:
deployer aws profile --config config.yaml --instance <name> --binary <path-to-binary-with-debug>This captures a 30-second profile (configurable with --duration) using samply on the remote
instance, downloads it, and opens it in Firefox Profiler. Unlike Continuous Profiling, this mode
does not require deploying a binary with debug symbols (reducing deployment time).
Like above, build your binary with debug symbols (but not frame pointers):
CARGO_PROFILE_RELEASE_DEBUG=true cargo build --releaseNow, strip symbols and deploy via aws create (preserve the original binary for profile symbolication
when you run the aws profile command shown above):
cp target/release/my-binary target/release/my-binary-debug
strip target/release/my-binary§Persistence
- A directory
$HOME/.commonware_deployer/{tag}stores:- SSH private key (
id_rsa_{tag}) - Deployment metadata (
metadata.yaml) containing tag, creation timestamp, regions, and instance names - Status files (
created,destroyed)
- SSH private key (
- The deployment state is tracked via these files, ensuring operations respect prior create/destroy actions.
- The
metadata.yamlfile enablesaws destroy --tagandaws listto work without the original config file.
§S3 Caching
A shared S3 bucket (commonware-deployer-cache) is used to cache deployment artifacts. The bucket
uses a fixed name intentionally so that all users within the same AWS account share the cache. This
design provides two benefits:
-
Faster deployments: Observability tools (Prometheus, Grafana, Loki, etc.) are downloaded from upstream sources once and cached in S3. Subsequent deployments by any user skip the download and use pre-signed URLs to fetch directly from S3.
-
Reduced bandwidth: Instead of requiring the deployer to push binaries to each instance, unique binaries are uploaded once to S3 and then pulled from there.
Per-deployment data (binaries, configs, hosts files) is isolated under deployments/{tag}/ to prevent
conflicts between concurrent deployments.
The bucket stores:
tools/binaries/{tool}/{version}/{platform}/{filename}- Tool binaries (e.g., prometheus, grafana)tools/configs/{deployer-version}/{component}/{file}- Static configs and service filesdeployments/{tag}/- Deployment-specific files:monitoring/- Prometheus config, dashboardinstances/{name}/- Binary, config, hosts.yaml, promtail config, pyroscope script
Tool binaries are namespaced by tool version and platform. Static configs are namespaced by deployer version to ensure cache invalidation when the deployer is updated.
§Example Configuration
tag: ffa638a0-991c-442c-8ec4-aa4e418213a5
monitoring:
instance_type: t4g.small # ARM64 (Graviton)
storage_size: 10
storage_class: gp2
dashboard: /path/to/dashboard.json
instances:
- name: node1
region: us-east-1
instance_type: t4g.small # ARM64 (Graviton)
storage_size: 10
storage_class: gp2
binary: /path/to/binary-arm64
config: /path/to/config.conf
profiling: true
- name: node2
region: us-west-2
instance_type: t3.small # x86_64 (Intel/AMD)
storage_size: 10
storage_class: gp2
binary: /path/to/binary-x86
config: /path/to/config2.conf
profiling: false
ports:
- protocol: tcp
port: 4545
cidr: 0.0.0.0/0Modules§
- ec2
- AWS EC2 SDK function wrappers
- s3
- AWS S3 SDK function wrappers for caching deployer artifacts
- services
- Service configuration for Prometheus, Loki, Grafana, Promtail, and a caller-provided binary
- utils
- Utility functions for interacting with EC2 instances
Structs§
- Config
- Deployer configuration
- Host
- Host deployment information
- Hosts
- List of hosts
- Instance
Config - Instance configuration
- Metadata
- Metadata persisted during deployment creation
- Monitoring
Config - Monitoring configuration
- Port
Config - Port configuration
Enums§
- Architecture
- CPU architecture for EC2 instances
- Bucket
Forbidden Reason - Reasons why accessing a bucket may be forbidden
- Error
- Errors that can occur when deploying infrastructure on AWS
- S3Operation
- S3 operations that can fail
Constants§
- AUTHORIZE_
CMD - Authorize subcommand name
- CLEAN_
CMD - Clean subcommand name
- CMD
- Subcommand name
- CREATE_
CMD - Create subcommand name
- DEFAULT_
CONCURRENCY - Maximum instances to manipulate at one time
- DESTROY_
CMD - Destroy subcommand name
- LIST_
CMD - List subcommand name
- METRICS_
PORT - Port on binary where metrics are exposed
- PROFILE_
CMD - Profile subcommand name
- UPDATE_
CMD - Update subcommand name
Functions§
- authorize
- Adds the deployer’s IP (or the one provided) to all security groups.
- clean
- Deletes the shared S3 cache bucket and all its contents
- create
- Sets up EC2 instances, deploys files, and configures monitoring and logging
- destroy
- Tears down all resources associated with the deployment tag
- list
- Lists all active deployments (created but not destroyed)
- profile
- Captures a CPU profile from a running instance using samply
- update
- Updates the binary and configuration on all binary nodes