Skip to main content

Module aws

Module aws 

Source
Expand description

AWS EC2 deployer

Deploy a custom binary (and configuration) to any number of EC2 instances across multiple regions. View metrics and logs from all instances with Grafana.

§Features

  • Automated creation, update, and destruction of EC2 instances across multiple regions
  • Provide a unique name, instance type, region, binary, and configuration for each deployed instance
  • Collect metrics, profiles (when enabled), and logs from all deployed instances on a long-lived monitoring instance (accessible only to the deployer’s IP)

§Architecture

                   Deployer's Machine (Public IP)
                                 |
                                 |
                                 v
              +-----------------------------------+
              | Monitoring VPC (us-east-1)        |
              |  - Monitoring Instance            |
              |    - Prometheus                   |
              |    - Loki                         |
              |    - Pyroscope                    |
              |    - Tempo                        |
              |    - Grafana                      |
              |  - Security Group                 |
              |    - All: Deployer IP             |
              |    - 3100: Binary VPCs            |
              |    - 4040: Binary VPCs            |
              |    - 4318: Binary VPCs            |
              +-----------------------------------+
                    ^                       ^
               (Telemetry)             (Telemetry)
                    |                       |
                    |                       |
+------------------------------+  +------------------------------+
| Binary VPC 1                 |  | Binary VPC 2                 |
|  - Binary Instance           |  |  - Binary Instance           |
|    - Binary A                |  |    - Binary B                |
|    - Promtail                |  |    - Promtail                |
|    - Node Exporter           |  |    - Node Exporter           |
|    - Pyroscope Agent         |  |    - Pyroscope Agent         |
|  - Security Group            |  |  - Security Group            |
|    - All: Deployer IP        |  |    - All: Deployer IP        |
|    - 9090: Monitoring IP     |  |    - 9090: Monitoring IP     |
|    - 9100: Monitoring IP     |  |    - 9100: Monitoring IP     |
|    - 8012: 0.0.0.0/0         |  |    - 8765: 12.3.7.9/32       |
+------------------------------+  +------------------------------+

§Instances

§Monitoring

  • Deployed in us-east-1 with a configurable instance type (e.g., t4g.small for ARM64, t3.small for x86_64) and storage (e.g., 10GB gp2). Architecture is auto-detected from the instance type.
  • Runs:
    • Prometheus: Scrapes binary metrics from all instances at :9090 and system metrics from all instances at :9100.
    • Loki: Listens at :3100, storing logs in /loki/chunks with a TSDB index at /loki/index.
    • Pyroscope: Listens at :4040, storing profiles in /var/lib/pyroscope.
    • Tempo: Listens at :4318, storing traces in /var/lib/tempo.
    • Grafana: Hosted at :3000, provisioned with Prometheus, Loki, and Tempo datasources and a custom dashboard.
  • Ingress:
    • Allows deployer IP access (TCP 0-65535).
    • Binary instance traffic to Loki (TCP 3100) and Tempo (TCP 4318).

§Binary

  • Deployed in user-specified regions with configurable ARM64 or AMD64 instance types and storage.
  • Run:
    • Custom Binary: Executes with --hosts=/home/ubuntu/hosts.yaml --config=/home/ubuntu/config.conf, exposing metrics at :9090.
    • Promtail: Forwards /var/log/binary.log to Loki on the monitoring instance.
    • Node Exporter: Exposes system metrics at :9100.
    • Pyroscope Agent: Forwards perf profiles to Pyroscope on the monitoring instance.
  • Ingress:
    • Deployer IP access (TCP 0-65535).
    • Monitoring IP access to :9090 and :9100 for Prometheus.
    • User-defined ports from the configuration.

§Networking

§VPCs

One per region with CIDR 10.<region-index>.0.0/16 (e.g., 10.0.0.0/16 for us-east-1).

§Subnets

One subnet per availability zone that supports any required instance type in the region (e.g., 10.<region-index>.<az-index>.0/24), linked to a shared route table with an internet gateway. Each instance is placed in an AZ that supports its instance type, distributed round-robin across eligible AZs, with automatic fallback to other AZs on capacity errors.

§VPC Peering

Connects the monitoring VPC to each binary VPC, with routes added to route tables for private communication.

§Security Groups

Separate for monitoring (tag) and binary instances ({tag}-binary), dynamically configured for deployer and inter-instance traffic.

§Workflow

§aws create

  1. Validates configuration and generates an SSH key pair, stored in $HOME/.commonware_deployer/{tag}/id_rsa_{tag}.
  2. Persists deployment metadata (tag, regions, instance names) to $HOME/.commonware_deployer/{tag}/metadata.yaml. This enables destroy --tag cleanup if creation fails.
  3. Ensures the shared S3 bucket exists and caches observability tools (Prometheus, Grafana, Loki, etc.) if not already present.
  4. Uploads deployment-specific files (binaries, configs) to S3.
  5. Creates VPCs, subnets, internet gateways, route tables, and security groups per region (concurrently).
  6. Establishes VPC peering between the monitoring region and binary regions.
  7. Launches the monitoring instance.
  8. Launches binary instances.
  9. Caches all static config files and uploads per-instance configs (hosts.yaml, promtail, pyroscope) to S3.
  10. Configures monitoring and binary instances in parallel via SSH (BBR, service installation, service startup).
  11. Updates the monitoring security group to allow telemetry traffic from binary instances.
  12. Marks completion with $HOME/.commonware_deployer/{tag}/created.

§aws update

Performs rolling updates across all binary instances:

  1. Uploads the latest binary and configuration to S3.
  2. For each instance (up to --concurrency at a time, default 128): a. Stops the binary service. b. Downloads the updated files from S3 via pre-signed URLs. c. Restarts the binary service. d. Waits for the service to become active before proceeding.

Use --concurrency 1 for fully sequential updates that wait for each instance to be healthy before updating the next.

§aws authorize

  1. Obtains the deployer’s current public IP address (or parses the one provided).
  2. For each security group in the deployment, adds an ingress rule for the IP (if it doesn’t already exist).

§aws destroy

Can be invoked with either --config <path> or --tag <tag>. When using --tag, the command reads regions from the persisted metadata.yaml file, allowing destruction without the original config file.

  1. Terminates all instances across regions.
  2. Deletes security groups, subnets, route tables, VPC peering connections, internet gateways, key pairs, and VPCs in dependency order.
  3. Deletes deployment-specific data from S3 (cached tools remain for future deployments).
  4. Marks destruction with $HOME/.commonware_deployer/{tag}/destroyed, retaining the directory to prevent tag reuse.

§aws clean

  1. Deletes the shared S3 bucket and all its contents (cached tools and any remaining deployment data).
  2. Use this to fully clean up when you no longer need the deployer cache.

§aws list

Lists all active deployments (created but not destroyed). For each deployment, displays the tag, creation timestamp, regions, and number of instances.

§aws profile

  1. Loads the deployment configuration and locates the specified instance.
  2. Caches the samply binary in S3 if not already present.
  3. SSHes to the instance, downloads samply, and records a CPU profile of the running binary for the specified duration.
  4. Downloads the profile locally via SCP.
  5. Opens Firefox Profiler with symbols resolved from your local debug binary.

§Profiling

The deployer supports two profiling modes:

§Continuous Profiling (Pyroscope)

Enable continuous CPU profiling by setting profiling: true in your instance config. This runs Pyroscope in the background, continuously collecting profiles that are viewable in the Grafana dashboard on the monitoring instance.

For best results, build and deploy your binary with debug symbols and frame pointers:

CARGO_PROFILE_RELEASE_DEBUG=true RUSTFLAGS="-C force-frame-pointers=yes" cargo build --release

§On-Demand Profiling (samply)

To generate an on-demand CPU profile (viewable in the Firefox Profiler UI), run the following:

deployer aws profile --config config.yaml --instance <name> --binary <path-to-binary-with-debug>

This captures a 30-second profile (configurable with --duration) using samply on the remote instance, downloads it, and opens it in Firefox Profiler. Unlike Continuous Profiling, this mode does not require deploying a binary with debug symbols (reducing deployment time).

Like above, build your binary with debug symbols (but not frame pointers):

CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release

Now, strip symbols and deploy via aws create (preserve the original binary for profile symbolication when you run the aws profile command shown above):

cp target/release/my-binary target/release/my-binary-debug
strip target/release/my-binary

§Persistence

  • A directory $HOME/.commonware_deployer/{tag} stores:
    • SSH private key (id_rsa_{tag})
    • Deployment metadata (metadata.yaml) containing tag, creation timestamp, regions, and instance names
    • Status files (created, destroyed)
  • The deployment state is tracked via these files, ensuring operations respect prior create/destroy actions.
  • The metadata.yaml file enables aws destroy --tag and aws list to work without the original config file.

§S3 Caching

A shared S3 bucket (commonware-deployer-cache) is used to cache deployment artifacts. The bucket uses a fixed name intentionally so that all users within the same AWS account share the cache. This design provides two benefits:

  1. Faster deployments: Observability tools (Prometheus, Grafana, Loki, etc.) are downloaded from upstream sources once and cached in S3. Subsequent deployments by any user skip the download and use pre-signed URLs to fetch directly from S3.

  2. Reduced bandwidth: Instead of requiring the deployer to push binaries to each instance, unique binaries are uploaded once to S3 and then pulled from there.

Per-deployment data (binaries, configs, hosts files) is isolated under deployments/{tag}/ to prevent conflicts between concurrent deployments.

The bucket stores:

  • tools/binaries/{tool}/{version}/{platform}/{filename} - Tool binaries (e.g., prometheus, grafana)
  • tools/configs/{deployer-version}/{component}/{file} - Static configs and service files
  • deployments/{tag}/ - Deployment-specific files:
    • monitoring/ - Prometheus config, dashboard
    • instances/{name}/ - Binary, config, hosts.yaml, promtail config, pyroscope script

Tool binaries are namespaced by tool version and platform. Static configs are namespaced by deployer version to ensure cache invalidation when the deployer is updated.

§Example Configuration

tag: ffa638a0-991c-442c-8ec4-aa4e418213a5
monitoring:
  instance_type: t4g.small  # ARM64 (Graviton)
  storage_size: 10
  storage_class: gp2
  dashboard: /path/to/dashboard.json
instances:
  - name: node1
    region: us-east-1
    instance_type: t4g.small  # ARM64 (Graviton)
    storage_size: 10
    storage_class: gp2
    binary: /path/to/binary-arm64
    config: /path/to/config.conf
    profiling: true
  - name: node2
    region: us-west-2
    instance_type: t3.small  # x86_64 (Intel/AMD)
    storage_size: 10
    storage_class: gp2
    binary: /path/to/binary-x86
    config: /path/to/config2.conf
    profiling: false
ports:
  - protocol: tcp
    port: 4545
    cidr: 0.0.0.0/0

Modules§

ec2
AWS EC2 SDK function wrappers
s3
AWS S3 SDK function wrappers for caching deployer artifacts
services
Service configuration for Prometheus, Loki, Grafana, Promtail, and a caller-provided binary
utils
Utility functions for interacting with EC2 instances

Structs§

Config
Deployer configuration
Host
Host deployment information
Hosts
List of hosts
InstanceConfig
Instance configuration
Metadata
Metadata persisted during deployment creation
MonitoringConfig
Monitoring configuration
PortConfig
Port configuration

Enums§

Architecture
CPU architecture for EC2 instances
BucketForbiddenReason
Reasons why accessing a bucket may be forbidden
Error
Errors that can occur when deploying infrastructure on AWS
S3Operation
S3 operations that can fail

Constants§

AUTHORIZE_CMD
Authorize subcommand name
CLEAN_CMD
Clean subcommand name
CMD
Subcommand name
CREATE_CMD
Create subcommand name
DEFAULT_CONCURRENCY
Maximum instances to manipulate at one time
DESTROY_CMD
Destroy subcommand name
LIST_CMD
List subcommand name
METRICS_PORT
Port on binary where metrics are exposed
PROFILE_CMD
Profile subcommand name
UPDATE_CMD
Update subcommand name

Functions§

authorize
Adds the deployer’s IP (or the one provided) to all security groups.
clean
Deletes the shared S3 cache bucket and all its contents
create
Sets up EC2 instances, deploys files, and configures monitoring and logging
destroy
Tears down all resources associated with the deployment tag
list
Lists all active deployments (created but not destroyed)
profile
Captures a CPU profile from a running instance using samply
update
Updates the binary and configuration on all binary nodes