Module ec2

Module ec2 

Source
Expand description

AWS EC2 deployer

Deploy a custom binary (and configuration) to any number of EC2 instances across multiple regions. View metrics and logs from all instances with Grafana.

§Features

  • Automated creation, update, and destruction of EC2 instances across multiple regions
  • Provide a unique name, instance type, region, binary, and configuration for each deployed instance
  • Collect metrics, profiles (when enabled), and logs from all deployed instances on a long-lived monitoring instance (accessible only to the deployer’s IP)

§Architecture

                   Deployer's Machine (Public IP)
                                 |
                                 |
                                 v
              +-----------------------------------+
              | Monitoring VPC (us-east-1)        |
              |  - Monitoring Instance            |
              |    - Prometheus                   |
              |    - Loki                         |
              |    - Pyroscope                    |
              |    - Tempo                        |
              |    - Grafana                      |
              |  - Security Group                 |
              |    - All: Deployer IP             |
              |    - 3100: Binary VPCs            |
              |    - 4040: Binary VPCs            |
              |    - 4318: Binary VPCs            |
              +-----------------------------------+
                    ^                       ^
               (Telemetry)             (Telemetry)
                    |                       |
                    |                       |
+------------------------------+  +------------------------------+
| Binary VPC 1                 |  | Binary VPC 2                 |
|  - Binary Instance           |  |  - Binary Instance           |
|    - Binary A                |  |    - Binary B                |
|    - Promtail                |  |    - Promtail                |
|    - Node Exporter           |  |    - Node Exporter           |
|    - Pyroscope Agent         |  |    - Pyroscope Agent         |
|  - Security Group            |  |  - Security Group            |
|    - All: Deployer IP        |  |    - All: Deployer IP        |
|    - 9090: Monitoring IP     |  |    - 9090: Monitoring IP     |
|    - 9100: Monitoring IP     |  |    - 9100: Monitoring IP     |
|    - 8012: 0.0.0.0/0         |  |    - 8765: 12.3.7.9/32       |
+------------------------------+  +------------------------------+

§Instances

§Monitoring

  • Deployed in us-east-1 with a configurable instance type (e.g., t4g.small for ARM64, t3.small for x86_64) and storage (e.g., 10GB gp2). Architecture is auto-detected from the instance type.
  • Runs:
    • Prometheus: Scrapes binary metrics from all instances at :9090 and system metrics from all instances at :9100.
    • Loki: Listens at :3100, storing logs in /loki/chunks with a TSDB index at /loki/index.
    • Pyroscope: Listens at :4040, storing profiles in /var/lib/pyroscope.
    • Tempo: Listens at :4318, storing traces in /var/lib/tempo.
    • Grafana: Hosted at :3000, provisioned with Prometheus, Loki, and Tempo datasources and a custom dashboard.
  • Ingress:
    • Allows deployer IP access (TCP 0-65535).
    • Binary instance traffic to Loki (TCP 3100) and Tempo (TCP 4318).

§Binary

  • Deployed in user-specified regions with configurable ARM64 or AMD64 instance types and storage.
  • Run:
    • Custom Binary: Executes with --hosts=/home/ubuntu/hosts.yaml --config=/home/ubuntu/config.conf, exposing metrics at :9090.
    • Promtail: Forwards /var/log/binary.log to Loki on the monitoring instance.
    • Node Exporter: Exposes system metrics at :9100.
    • Pyroscope Agent: Forwards perf profiles to Pyroscope on the monitoring instance.
  • Ingress:
    • Deployer IP access (TCP 0-65535).
    • Monitoring IP access to :9090 and :9100 for Prometheus.
    • User-defined ports from the configuration.

§Networking

§VPCs

One per region with CIDR 10.<region-index>.0.0/16 (e.g., 10.0.0.0/16 for us-east-1).

§Subnets

Single subnet per VPC (e.g., 10.<region-index>.1.0/24), linked to a route table with an internet gateway.

§VPC Peering

Connects the monitoring VPC to each binary VPC, with routes added to route tables for private communication.

§Security Groups

Separate for monitoring (tag) and binary instances ({tag}-binary), dynamically configured for deployer and inter-instance traffic.

§Workflow

§ec2 create

  1. Validates configuration and generates an SSH key pair, stored in $HOME/.commonware_deployer/{tag}/id_rsa_{tag}.
  2. Ensures the shared S3 bucket exists and caches observability tools (Prometheus, Grafana, Loki, etc.) if not already present.
  3. Uploads deployment-specific files (binaries, configs) to S3.
  4. Creates VPCs, subnets, internet gateways, route tables, and security groups per region (concurrently).
  5. Establishes VPC peering between the monitoring region and binary regions.
  6. Launches the monitoring instance.
  7. Launches binary instances.
  8. Caches all static config files and uploads per-instance configs (hosts.yaml, promtail, pyroscope) to S3.
  9. Configures monitoring and binary instances in parallel via SSH (BBR, service installation, service startup).
  10. Updates the monitoring security group to allow telemetry traffic from binary instances.
  11. Marks completion with $HOME/.commonware_deployer/{tag}/created.

§ec2 update

  1. Uploads the latest binary and configuration to S3.
  2. Stops the binary service on each binary instance.
  3. Instances download the updated files from S3 via pre-signed URLs.
  4. Restarts the binary service, ensuring minimal downtime.

§ec2 authorize

  1. Obtains the deployer’s current public IP address (or parses the one provided).
  2. For each security group in the deployment, adds an ingress rule for the IP (if it doesn’t already exist).

§ec2 destroy

  1. Terminates all instances across regions.
  2. Deletes security groups, subnets, route tables, VPC peering connections, internet gateways, key pairs, and VPCs in dependency order.
  3. Deletes deployment-specific data from S3 (cached tools remain for future deployments).
  4. Marks destruction with $HOME/.commonware_deployer/{tag}/destroyed, retaining the directory to prevent tag reuse.

§ec2 clean

  1. Deletes the shared S3 bucket and all its contents (cached tools and any remaining deployment data).
  2. Use this to fully clean up when you no longer need the deployer cache.

§Persistence

  • A directory $HOME/.commonware_deployer/{tag} stores the SSH private key and status files (created, destroyed).
  • The deployment state is tracked via these files, ensuring operations respect prior create/destroy actions.

§S3 Caching

A shared S3 bucket (commonware-deployer-cache) is used to cache deployment artifacts. The bucket uses a fixed name intentionally so that all users within the same AWS account share the cache. This design provides two benefits:

  1. Faster deployments: Observability tools (Prometheus, Grafana, Loki, etc.) are downloaded from upstream sources once and cached in S3. Subsequent deployments by any user skip the download and use pre-signed URLs to fetch directly from S3.

  2. Reduced bandwidth: Instead of requiring the deployer to push binaries to each instance, unique binaries are uploaded once to S3 and then pulled from there.

Per-deployment data (binaries, configs, hosts files) is isolated under deployments/{tag}/ to prevent conflicts between concurrent deployments.

The bucket stores:

  • tools/binaries/{tool}/{version}/{platform}/{filename} - Tool binaries (e.g., prometheus, grafana)
  • tools/configs/{deployer-version}/{component}/{file} - Static configs and service files
  • deployments/{tag}/ - Deployment-specific files:
    • monitoring/ - Prometheus config, dashboard
    • instances/{name}/ - Binary, config, hosts.yaml, promtail config, pyroscope script

Tool binaries are namespaced by tool version and platform. Static configs are namespaced by deployer version to ensure cache invalidation when the deployer is updated.

§Example Configuration

tag: ffa638a0-991c-442c-8ec4-aa4e418213a5
monitoring:
  instance_type: t4g.small  # ARM64 (Graviton)
  storage_size: 10
  storage_class: gp2
  dashboard: /path/to/dashboard.json
instances:
  - name: node1
    region: us-east-1
    instance_type: t4g.small  # ARM64 (Graviton)
    storage_size: 10
    storage_class: gp2
    binary: /path/to/binary-arm64
    config: /path/to/config.conf
    profiling: true
  - name: node2
    region: us-west-2
    instance_type: t3.small  # x86_64 (Intel/AMD)
    storage_size: 10
    storage_class: gp2
    binary: /path/to/binary-x86
    config: /path/to/config2.conf
    profiling: false
ports:
  - protocol: tcp
    port: 4545
    cidr: 0.0.0.0/0

Modules§

aws
AWS EC2 SDK function wrappers
s3
AWS S3 SDK function wrappers for caching deployer artifacts
services
Service configuration for Prometheus, Loki, Grafana, Promtail, and a caller-provided binary
utils
Utility functions for interacting with EC2 instances

Structs§

Config
Deployer configuration
Host
Host deployment information
Hosts
List of hosts
InstanceConfig
Instance configuration
MonitoringConfig
Monitoring configuration
PortConfig
Port configuration

Enums§

Architecture
CPU architecture for EC2 instances
BucketForbiddenReason
Reasons why accessing a bucket may be forbidden
Error
Errors that can occur when deploying infrastructure on AWS
S3Operation
S3 operations that can fail

Constants§

AUTHORIZE_CMD
Authorize subcommand name
CLEAN_CMD
Clean subcommand name
CMD
Subcommand name
CREATE_CMD
Create subcommand name
DESTROY_CMD
Destroy subcommand name
METRICS_PORT
Port on binary where metrics are exposed
UPDATE_CMD
Update subcommand name

Functions§

authorize
Adds the deployer’s IP (or the one provided) to all security groups.
clean
Deletes the shared S3 cache bucket and all its contents
create
Sets up EC2 instances, deploys files, and configures monitoring and logging
destroy
Tears down all resources associated with the deployment tag
update
Updates the binary and configuration on all binary nodes