logo
Expand description

Amazon EMR is a web service that makes it easier to process large amounts of data efficiently. Amazon EMR uses Hadoop processing combined with several AWS services to do tasks such as web indexing, data mining, log file analysis, machine learning, scientific simulation, and data warehouse management.

If you’re using the service, you’re probably looking for EmrClient and Emr.

Structs

Input to an AddInstanceGroups call.

Output from an AddInstanceGroups call.

The input argument to the AddJobFlowSteps operation.

The output for the AddJobFlowSteps operation.

This input identifies a cluster and a list of tags to attach.

This output indicates the result of adding tags to a resource.

With Amazon EMR release version 4.0 and later, the only accepted parameter is the application name. To pass arguments to applications, you use configuration classifications specified using configuration JSON objects. For more information, see Configuring Applications.

With earlier Amazon EMR releases, the application is any Amazon or third-party software that you can add to the cluster. This structure contains a list of strings that indicates the software to use with the cluster and accepts a user argument list. Amazon EMR accepts and forwards the argument list to the corresponding installation script as bootstrap action argument.

An automatic scaling policy for a core instance group or task instance group in an Amazon EMR cluster. An automatic scaling policy defines how an instance group dynamically adds and terminates EC2 instances in response to the value of a CloudWatch metric. See PutAutoScalingPolicy.

An automatic scaling policy for a core instance group or task instance group in an Amazon EMR cluster. The automatic scaling policy defines how an instance group dynamically adds and terminates EC2 instances in response to the value of a CloudWatch metric. See PutAutoScalingPolicy.

The status of an automatic scaling policy.

A configuration for Amazon EMR block public access. When BlockPublicSecurityGroupRules is set to true, Amazon EMR prevents cluster creation if one of the cluster's security groups has a rule that allows inbound traffic from 0.0.0.0/0 or ::/0 on a port, unless the port is specified as an exception using PermittedPublicSecurityGroupRuleRanges.

Properties that describe the AWS principal that created the BlockPublicAccessConfiguration using the PutBlockPublicAccessConfiguration action as well as the date and time that the configuration was created. Each time a configuration for block public access is updated, Amazon EMR updates this metadata.

Configuration of a bootstrap action.

Reports the configuration of a bootstrap action in a cluster (job flow).

Specification of the status of a CancelSteps request. Available only in Amazon EMR version 4.8.0 and later, excluding version 5.0.0.

The input argument to the CancelSteps operation.

The output for the CancelSteps operation.

The definition of a CloudWatch metric alarm, which determines when an automatic scaling activity is triggered. When the defined alarm conditions are satisfied, scaling activity begins.

The detailed description of the cluster.

The reason that the cluster changed to its current state.

The detailed status of the cluster.

The summary description of the cluster.

Represents the timeline of the cluster's lifecycle.

An entity describing an executable that runs on a cluster.

The EC2 unit limits for a managed scaling policy. The managed scaling activity of a cluster can not be above or below these limits. The limit only applies to the core and task nodes. The master node cannot be scaled after initial configuration.

Amazon EMR releases 4.x or later.

An optional configuration specification to be used when provisioning cluster instances, which can include configurations for applications and software bundled with Amazon EMR. A configuration consists of a classification, properties, and optional nested configurations. A classification refers to an application-specific configuration file. Properties are the settings you want to change in that file. For more information, see Configuring Applications.

This input determines which cluster to describe.

This output contains the description of the cluster.

The input for the DescribeJobFlows operation.

The output for the DescribeJobFlows operation.

This input determines which step to describe.

This output contains the description of the cluster step.

Configuration of requested EBS block device associated with the instance group.

Configuration of requested EBS block device associated with the instance group with count of volumes that will be associated to every instance.

The Amazon EBS configuration of a cluster instance.

EBS block device that's attached to an EC2 instance.

Provides information about the EC2 instances in a cluster grouped by category. For example, key name, subnet ID, IAM instance profile, and so on.

A client for the Amazon EMR API.

Specifies the execution engine (cluster) to run the notebook and perform the notebook execution, for example, an EMR cluster.

The details of the step failure. The service attempts to detect the root cause for many common failures.

A job flow step consisting of a JAR file whose main function will be executed. The main function submits a job for Hadoop to execute and waits for the job to finish or fail.

A cluster step consisting of a JAR file whose main function will be executed. The main function submits a job for Hadoop to execute and waits for the job to finish or fail.

Represents an EC2 instance provisioned as part of cluster.

Describes an instance fleet, which is a group of EC2 instances that host a particular node type (master, core, or task) in an Amazon EMR cluster. Instance fleets can consist of a mix of instance types and On-Demand and Spot Instances, which are provisioned to meet a defined target capacity.

The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.

The configuration that defines an instance fleet.

The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.

Configuration parameters for an instance fleet modification request.

The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.

The launch specification for Spot Instances in the fleet, which determines the defined duration, provisioning timeout behavior, and allocation strategy.

The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions. On-Demand and Spot Instance allocation strategies are available in Amazon EMR version 5.12.1 and later.

Provides status change reason details for the instance fleet.

The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.

The status of the instance fleet.

The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.

Provides historical timestamps for the instance fleet, including the time of creation, the time it became ready to run jobs, and the time of termination.

The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.

This entity represents an instance group, which is a group of instances that have common purpose. For example, CORE instance group is used for HDFS.

Configuration defining a new instance group.

Detailed information about an instance group.

Modify the size or configurations of an instance group.

The status change reason details for the instance group.

The details of the instance group status.

The timeline of the instance group lifecycle.

Custom policy for requesting termination protection or termination of specific instances when shrinking an instance group.

The details of the status change reason for the instance.

The instance status details.

The timeline of the instance lifecycle.

An instance type configuration for each instance type in an instance fleet, which determines the EC2 instances Amazon EMR attempts to provision to fulfill On-Demand and Spot target capacities. There can be a maximum of five instance type configurations in a fleet.

The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.

The configuration specification for each instance type in an instance fleet.

The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.

A description of a cluster (job flow).

Describes the status of the cluster (job flow).

A description of the Amazon EC2 instance on which the cluster (job flow) runs. A valid JobFlowInstancesConfig must contain either InstanceGroups or InstanceFleets. They cannot be used together. You may also have MasterInstanceType, SlaveInstanceType, and InstanceCount (all three must be present), but we don't recommend this configuration.

Specify the type of Amazon EC2 instances that the cluster (job flow) runs on.

Attributes for Kerberos configuration when Kerberos authentication is enabled using a security configuration. For more information see Use Kerberos Authentication in the Amazon EMR Management Guide.

A key-value pair.

This input determines which bootstrap actions to retrieve.

This output contains the bootstrap actions detail.

This input determines how the ListClusters action filters the list of clusters that it returns.

This contains a ClusterSummaryList with the cluster details; for example, the cluster IDs, names, and status.

This input determines which instance groups to retrieve.

This input determines which instance groups to retrieve.

This input determines which instances to list.

This output contains the list of instances.

This input determines which steps to list.

This output contains the list of steps returned in reverse order. This means that the last step is the first element in the list.

Managed scaling policy for an Amazon EMR cluster. The policy specifies the limits for resources that can be added or terminated from a cluster. The policy only applies to the core and task nodes. The master node cannot be scaled after initial configuration.

A CloudWatch dimension, which is specified using a Key (known as a Name in CloudWatch), Value pair. By default, Amazon EMR uses one dimension whose Key is JobFlowID and Value is a variable representing the cluster ID, which is ${emr.clusterId}. This enables the rule to bootstrap when the cluster ID becomes available.

Change the size of some instance groups.

A notebook execution. An execution is a specific instance that an EMR Notebook is run using the StartNotebookExecution action.

Describes the strategy for using unused Capacity Reservations for fulfilling On-Demand capacity.

The launch specification for On-Demand Instances in the instance fleet, which determines the allocation strategy.

The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions. On-Demand Instances allocation strategy is available in Amazon EMR version 5.12.1 and later.

Placement group configuration for an Amazon EMR cluster. The configuration specifies the placement strategy that can be applied to instance roles during cluster creation.

To use this configuration, consider attaching managed policy AmazonElasticMapReducePlacementGroupPolicy to the EMR role.

The Amazon EC2 Availability Zone configuration of the cluster (job flow).

A list of port ranges that are permitted to allow inbound traffic from all public IP addresses. To specify a single port, use the same value for MinRange and MaxRange.

This input identifies a cluster and a list of tags to remove.

This output indicates the result of removing tags from a resource.

Input to the RunJobFlow operation.

The result of the RunJobFlow operation.

The type of adjustment the automatic scaling activity makes when triggered, and the periodicity of the adjustment.

The upper and lower EC2 instance limits for an automatic scaling policy. Automatic scaling activities triggered by automatic scaling rules will not cause an instance group to grow above or below these limits.

A scale-in or scale-out rule that defines scaling activity, including the CloudWatch metric alarm that triggers activity, how EC2 instances are added or removed, and the periodicity of adjustments. The automatic scaling policy for an instance group can comprise one or more automatic scaling rules.

The conditions that trigger an automatic scaling activity.

Configuration of the script to run during a bootstrap action.

The creation date and time, and name, of a security configuration.

Details for an Amazon EMR Studio session mapping including creation time, user or group ID, Studio ID, and so on.

Details for an Amazon EMR Studio session mapping. The details do not include the time the session mapping was last modified.

The input argument to the TerminationProtection operation.

The input to the SetVisibleToAllUsers action.

Policy for customizing shrink operations. Allows configuration of decommissioning timeout and targeted instance shrinking.

An automatic scaling configuration, which describes how the policy adds or removes instances, the cooldown period, and the number of EC2 instances that will be added each time the CloudWatch metric alarm condition is satisfied.

The launch specification for Spot Instances in the instance fleet, which determines the defined duration, provisioning timeout behavior, and allocation strategy.

The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions. Spot Instance allocation strategy is available in Amazon EMR version 5.12.1 and later.

This represents a step in a cluster.

Specification of a cluster (job flow) step.

Combines the execution state and configuration of a step.

The execution state of a step.

The details of the step state change reason.

The execution status details of the cluster step.

The summary of the cluster step.

The timeline of the cluster step lifecycle.

Details for an Amazon EMR Studio including ID, creation time, name, and so on.

Details for an Amazon EMR Studio, including ID, Name, VPC, and Description. The details do not include subnets, IAM roles, security groups, or tags associated with the Studio.

The list of supported product configurations that allow user-supplied arguments. EMR accepts these arguments and forwards them to the corresponding installation script as bootstrap action arguments.

A key-value pair containing user-defined metadata that you can associate with an Amazon EMR resource. Tags make it easier to associate clusters in various ways, such as grouping clusters to track your Amazon EMR resource allocation costs. For more information, see Tag Clusters.

Input to the TerminateJobFlows operation.

EBS volume specifications such as volume type, IOPS, and size (GiB) that will be requested for the EBS volume attached to an EC2 instance in the cluster.

Enums

Errors returned by AddInstanceFleet

Errors returned by AddInstanceGroups

Errors returned by AddJobFlowSteps

Errors returned by AddTags

Errors returned by CancelSteps

Errors returned by CreateSecurityConfiguration

Errors returned by CreateStudio

Errors returned by CreateStudioSessionMapping

Errors returned by DeleteSecurityConfiguration

Errors returned by DeleteStudio

Errors returned by DeleteStudioSessionMapping

Errors returned by DescribeCluster

Errors returned by DescribeJobFlows

Errors returned by DescribeNotebookExecution

Errors returned by DescribeSecurityConfiguration

Errors returned by DescribeStep

Errors returned by DescribeStudio

Errors returned by GetBlockPublicAccessConfiguration

Errors returned by GetManagedScalingPolicy

Errors returned by GetStudioSessionMapping

Errors returned by ListBootstrapActions

Errors returned by ListClusters

Errors returned by ListInstanceFleets

Errors returned by ListInstanceGroups

Errors returned by ListInstances

Errors returned by ListNotebookExecutions

Errors returned by ListSecurityConfigurations

Errors returned by ListSteps

Errors returned by ListStudioSessionMappings

Errors returned by ListStudios

Errors returned by ModifyCluster

Errors returned by ModifyInstanceFleet

Errors returned by ModifyInstanceGroups

Errors returned by PutAutoScalingPolicy

Errors returned by PutBlockPublicAccessConfiguration

Errors returned by PutManagedScalingPolicy

Errors returned by RemoveAutoScalingPolicy

Errors returned by RemoveManagedScalingPolicy

Errors returned by RemoveTags

Errors returned by RunJobFlow

Errors returned by SetTerminationProtection

Errors returned by SetVisibleToAllUsers

Errors returned by StartNotebookExecution

Errors returned by StopNotebookExecution

Errors returned by TerminateJobFlows

Errors returned by UpdateStudio

Errors returned by UpdateStudioSessionMapping

Traits

Trait representing the capabilities of the Amazon EMR API. Amazon EMR clients implement this trait.