Expand description
AWS EC2 deployer
Deploy a custom binary (and configuration) to any number of EC2 instances across multiple regions. View metrics and logs from all instances with Grafana.
§Features
- Automated creation, update, and destruction of EC2 instances across multiple regions
- Provide a unique name, instance type, region, binary, and configuration for each deployed instance
- Collect metrics, profiles (when enabled), and logs from all deployed instances on a long-lived monitoring instance (accessible only to the deployer’s IP)
§Architecture
Deployer's Machine (Public IP)
|
|
v
+-----------------------------------+
| Monitoring VPC (us-east-1) |
| - Monitoring Instance |
| - Prometheus |
| - Loki |
| - Pyroscope |
| - Tempo |
| - Grafana |
| - Security Group |
| - All: Deployer IP |
| - 3100: Binary VPCs |
| - 4040: Binary VPCs |
| - 4318: Binary VPCs |
+-----------------------------------+
^ ^
(Telemetry) (Telemetry)
| |
| |
+------------------------------+ +------------------------------+
| Binary VPC 1 | | Binary VPC 2 |
| - Binary Instance | | - Binary Instance |
| - Binary A | | - Binary B |
| - Promtail | | - Promtail |
| - Node Exporter | | - Node Exporter |
| - Pyroscope Agent | | - Pyroscope Agent |
| - Security Group | | - Security Group |
| - All: Deployer IP | | - All: Deployer IP |
| - 9090: Monitoring IP | | - 9090: Monitoring IP |
| - 9100: Monitoring IP | | - 9100: Monitoring IP |
| - 8012: 0.0.0.0/0 | | - 8765: 12.3.7.9/32 |
+------------------------------+ +------------------------------+§Instances
§Monitoring
- Deployed in
us-east-1with a configurable instance type (e.g.,t4g.smallfor ARM64,t3.smallfor x86_64) and storage (e.g., 10GB gp2). Architecture is auto-detected from the instance type. - Runs:
- Prometheus: Scrapes binary metrics from all instances at
:9090and system metrics from all instances at:9100. - Loki: Listens at
:3100, storing logs in/loki/chunkswith a TSDB index at/loki/index. - Pyroscope: Listens at
:4040, storing profiles in/var/lib/pyroscope. - Tempo: Listens at
:4318, storing traces in/var/lib/tempo. - Grafana: Hosted at
:3000, provisioned with Prometheus, Loki, and Tempo datasources and a custom dashboard.
- Prometheus: Scrapes binary metrics from all instances at
- Ingress:
- Allows deployer IP access (TCP 0-65535).
- Binary instance traffic to Loki (TCP 3100) and Tempo (TCP 4318).
§Binary
- Deployed in user-specified regions with configurable ARM64 or AMD64 instance types and storage.
- Run:
- Custom Binary: Executes with
--hosts=/home/ubuntu/hosts.yaml --config=/home/ubuntu/config.conf, exposing metrics at:9090. - Promtail: Forwards
/var/log/binary.logto Loki on the monitoring instance. - Node Exporter: Exposes system metrics at
:9100. - Pyroscope Agent: Forwards
perfprofiles to Pyroscope on the monitoring instance.
- Custom Binary: Executes with
- Ingress:
- Deployer IP access (TCP 0-65535).
- Monitoring IP access to
:9090and:9100for Prometheus. - User-defined ports from the configuration.
§Networking
§VPCs
One per region with CIDR 10.<region-index>.0.0/16 (e.g., 10.0.0.0/16 for us-east-1).
§Subnets
Single subnet per VPC (e.g., 10.<region-index>.1.0/24), linked to a route table with an internet gateway.
§VPC Peering
Connects the monitoring VPC to each binary VPC, with routes added to route tables for private communication.
§Security Groups
Separate for monitoring (tag) and binary instances ({tag}-binary), dynamically configured for deployer and inter-instance traffic.
§Workflow
§ec2 create
- Validates configuration and generates an SSH key pair, stored in
$HOME/.commonware_deployer/{tag}/id_rsa_{tag}. - Ensures the shared S3 bucket exists and caches observability tools (Prometheus, Grafana, Loki, etc.) if not already present.
- Uploads deployment-specific files (binaries, configs) to S3.
- Creates VPCs, subnets, internet gateways, route tables, and security groups per region (concurrently).
- Establishes VPC peering between the monitoring region and binary regions.
- Launches the monitoring instance.
- Launches binary instances.
- Caches all static config files and uploads per-instance configs (hosts.yaml, promtail, pyroscope) to S3.
- Configures monitoring and binary instances in parallel via SSH (BBR, service installation, service startup).
- Updates the monitoring security group to allow telemetry traffic from binary instances.
- Marks completion with
$HOME/.commonware_deployer/{tag}/created.
§ec2 update
- Uploads the latest binary and configuration to S3.
- Stops the
binaryservice on each binary instance. - Instances download the updated files from S3 via pre-signed URLs.
- Restarts the
binaryservice, ensuring minimal downtime.
§ec2 authorize
- Obtains the deployer’s current public IP address (or parses the one provided).
- For each security group in the deployment, adds an ingress rule for the IP (if it doesn’t already exist).
§ec2 destroy
- Terminates all instances across regions.
- Deletes security groups, subnets, route tables, VPC peering connections, internet gateways, key pairs, and VPCs in dependency order.
- Deletes deployment-specific data from S3 (cached tools remain for future deployments).
- Marks destruction with
$HOME/.commonware_deployer/{tag}/destroyed, retaining the directory to prevent tag reuse.
§ec2 clean
- Deletes the shared S3 bucket and all its contents (cached tools and any remaining deployment data).
- Use this to fully clean up when you no longer need the deployer cache.
§Persistence
- A directory
$HOME/.commonware_deployer/{tag}stores the SSH private key and status files (created,destroyed). - The deployment state is tracked via these files, ensuring operations respect prior create/destroy actions.
§S3 Caching
A shared S3 bucket (commonware-deployer-cache) is used to cache deployment artifacts. The bucket
uses a fixed name intentionally so that all users within the same AWS account share the cache. This
design provides two benefits:
-
Faster deployments: Observability tools (Prometheus, Grafana, Loki, etc.) are downloaded from upstream sources once and cached in S3. Subsequent deployments by any user skip the download and use pre-signed URLs to fetch directly from S3.
-
Reduced bandwidth: Instead of requiring the deployer to push binaries to each instance, unique binaries are uploaded once to S3 and then pulled from there.
Per-deployment data (binaries, configs, hosts files) is isolated under deployments/{tag}/ to prevent
conflicts between concurrent deployments.
The bucket stores:
tools/binaries/{tool}/{version}/{platform}/{filename}- Tool binaries (e.g., prometheus, grafana)tools/configs/{deployer-version}/{component}/{file}- Static configs and service filesdeployments/{tag}/- Deployment-specific files:monitoring/- Prometheus config, dashboardinstances/{name}/- Binary, config, hosts.yaml, promtail config, pyroscope script
Tool binaries are namespaced by tool version and platform. Static configs are namespaced by deployer version to ensure cache invalidation when the deployer is updated.
§Example Configuration
tag: ffa638a0-991c-442c-8ec4-aa4e418213a5
monitoring:
instance_type: t4g.small # ARM64 (Graviton)
storage_size: 10
storage_class: gp2
dashboard: /path/to/dashboard.json
instances:
- name: node1
region: us-east-1
instance_type: t4g.small # ARM64 (Graviton)
storage_size: 10
storage_class: gp2
binary: /path/to/binary-arm64
config: /path/to/config.conf
profiling: true
- name: node2
region: us-west-2
instance_type: t3.small # x86_64 (Intel/AMD)
storage_size: 10
storage_class: gp2
binary: /path/to/binary-x86
config: /path/to/config2.conf
profiling: false
ports:
- protocol: tcp
port: 4545
cidr: 0.0.0.0/0Modules§
- aws
- AWS EC2 SDK function wrappers
- s3
- AWS S3 SDK function wrappers for caching deployer artifacts
- services
- Service configuration for Prometheus, Loki, Grafana, Promtail, and a caller-provided binary
- utils
- Utility functions for interacting with EC2 instances
Structs§
- Config
- Deployer configuration
- Host
- Host deployment information
- Hosts
- List of hosts
- Instance
Config - Instance configuration
- Monitoring
Config - Monitoring configuration
- Port
Config - Port configuration
Enums§
- Architecture
- CPU architecture for EC2 instances
- Bucket
Forbidden Reason - Reasons why accessing a bucket may be forbidden
- Error
- Errors that can occur when deploying infrastructure on AWS
- S3Operation
- S3 operations that can fail
Constants§
- AUTHORIZE_
CMD - Authorize subcommand name
- CLEAN_
CMD - Clean subcommand name
- CMD
- Subcommand name
- CREATE_
CMD - Create subcommand name
- DESTROY_
CMD - Destroy subcommand name
- METRICS_
PORT - Port on binary where metrics are exposed
- UPDATE_
CMD - Update subcommand name
Functions§
- authorize
- Adds the deployer’s IP (or the one provided) to all security groups.
- clean
- Deletes the shared S3 cache bucket and all its contents
- create
- Sets up EC2 instances, deploys files, and configures monitoring and logging
- destroy
- Tears down all resources associated with the deployment tag
- update
- Updates the binary and configuration on all binary nodes