Expand description
Amazon EMR is a web service that makes it easy to process large amounts of data efficiently. Amazon EMR uses Hadoop processing combined with several AWS products to do tasks such as web indexing, data mining, log file analysis, machine learning, scientific simulation, and data warehousing.
If you’re using the service, you’re probably looking for EmrClient and Emr.
Structs§
- AddInstance
Fleet Input - AddInstance
Fleet Output - AddInstance
Groups Input Input to an AddInstanceGroups call.
- AddInstance
Groups Output Output from an AddInstanceGroups call.
- AddJob
Flow Steps Input The input argument to the AddJobFlowSteps operation.
- AddJob
Flow Steps Output The output for the AddJobFlowSteps operation.
- AddTags
Input This input identifies a cluster and a list of tags to attach.
- AddTags
Output This output indicates the result of adding tags to a resource.
- Application
With Amazon EMR release version 4.0 and later, the only accepted parameter is the application name. To pass arguments to applications, you use configuration classifications specified using configuration JSON objects. For more information, see Configuring Applications.
With earlier Amazon EMR releases, the application is any Amazon or third-party software that you can add to the cluster. This structure contains a list of strings that indicates the software to use with the cluster and accepts a user argument list. Amazon EMR accepts and forwards the argument list to the corresponding installation script as bootstrap action argument.
- Auto
Scaling Policy An automatic scaling policy for a core instance group or task instance group in an Amazon EMR cluster. An automatic scaling policy defines how an instance group dynamically adds and terminates EC2 instances in response to the value of a CloudWatch metric. See PutAutoScalingPolicy.
- Auto
Scaling Policy Description An automatic scaling policy for a core instance group or task instance group in an Amazon EMR cluster. The automatic scaling policy defines how an instance group dynamically adds and terminates EC2 instances in response to the value of a CloudWatch metric. See PutAutoScalingPolicy.
- Auto
Scaling Policy State Change Reason The reason for an AutoScalingPolicyStatus change.
- Auto
Scaling Policy Status The status of an automatic scaling policy.
- Block
Public Access Configuration A configuration for Amazon EMR block public access. When
BlockPublicSecurityGroupRules
is set totrue
, Amazon EMR prevents cluster creation if one of the cluster's security groups has a rule that allows inbound traffic from 0.0.0.0/0 or ::/0 on a port, unless the port is specified as an exception usingPermittedPublicSecurityGroupRuleRanges
.- Block
Public Access Configuration Metadata Properties that describe the AWS principal that created the
BlockPublicAccessConfiguration
using thePutBlockPublicAccessConfiguration
action as well as the date and time that the configuration was created. Each time a configuration for block public access is updated, Amazon EMR updates this metadata.- Bootstrap
Action Config Configuration of a bootstrap action.
- Bootstrap
Action Detail Reports the configuration of a bootstrap action in a cluster (job flow).
- Cancel
Steps Info Specification of the status of a CancelSteps request. Available only in Amazon EMR version 4.8.0 and later, excluding version 5.0.0.
- Cancel
Steps Input The input argument to the CancelSteps operation.
- Cancel
Steps Output The output for the CancelSteps operation.
- Cloud
Watch Alarm Definition The definition of a CloudWatch metric alarm, which determines when an automatic scaling activity is triggered. When the defined alarm conditions are satisfied, scaling activity begins.
- Cluster
The detailed description of the cluster.
- Cluster
State Change Reason The reason that the cluster changed to its current state.
- Cluster
Status The detailed status of the cluster.
- Cluster
Summary The summary description of the cluster.
- Cluster
Timeline Represents the timeline of the cluster's lifecycle.
- Command
An entity describing an executable that runs on a cluster.
- Compute
Limits The EC2 unit limits for a managed scaling policy. The managed scaling activity of a cluster can not be above or below these limits. The limit only applies to the core and task nodes. The master node cannot be scaled after initial configuration.
- Configuration
Amazon EMR releases 4.x or later.
An optional configuration specification to be used when provisioning cluster instances, which can include configurations for applications and software bundled with Amazon EMR. A configuration consists of a classification, properties, and optional nested configurations. A classification refers to an application-specific configuration file. Properties are the settings you want to change in that file. For more information, see Configuring Applications.
- Create
Security Configuration Input - Create
Security Configuration Output - Delete
Security Configuration Input - Delete
Security Configuration Output - Describe
Cluster Input This input determines which cluster to describe.
- Describe
Cluster Output This output contains the description of the cluster.
- Describe
JobFlows Input The input for the DescribeJobFlows operation.
- Describe
JobFlows Output The output for the DescribeJobFlows operation.
- Describe
Security Configuration Input - Describe
Security Configuration Output - Describe
Step Input This input determines which step to describe.
- Describe
Step Output This output contains the description of the cluster step.
- EbsBlock
Device Configuration of requested EBS block device associated with the instance group.
- EbsBlock
Device Config Configuration of requested EBS block device associated with the instance group with count of volumes that will be associated to every instance.
- EbsConfiguration
The Amazon EBS configuration of a cluster instance.
- EbsVolume
EBS block device that's attached to an EC2 instance.
- Ec2Instance
Attributes Provides information about the EC2 instances in a cluster grouped by category. For example, key name, subnet ID, IAM instance profile, and so on.
- EmrClient
- A client for the Amazon EMR API.
- Failure
Details The details of the step failure. The service attempts to detect the root cause for many common failures.
- GetBlock
Public Access Configuration Input - GetBlock
Public Access Configuration Output - GetManaged
Scaling Policy Input - GetManaged
Scaling Policy Output - Hadoop
JarStep Config A job flow step consisting of a JAR file whose main function will be executed. The main function submits a job for Hadoop to execute and waits for the job to finish or fail.
- Hadoop
Step Config A cluster step consisting of a JAR file whose main function will be executed. The main function submits a job for Hadoop to execute and waits for the job to finish or fail.
- Instance
Represents an EC2 instance provisioned as part of cluster.
- Instance
Fleet Describes an instance fleet, which is a group of EC2 instances that host a particular node type (master, core, or task) in an Amazon EMR cluster. Instance fleets can consist of a mix of instance types and On-Demand and Spot instances, which are provisioned to meet a defined target capacity.
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.
- Instance
Fleet Config The configuration that defines an instance fleet.
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.
- Instance
Fleet Modify Config Configuration parameters for an instance fleet modification request.
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.
- Instance
Fleet Provisioning Specifications The launch specification for Spot instances in the fleet, which determines the defined duration, provisioning timeout behavior, and allocation strategy.
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions. On-Demand and Spot instance allocation strategies are available in Amazon EMR version 5.12.1 and later.
- Instance
Fleet State Change Reason Provides status change reason details for the instance fleet.
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.
- Instance
Fleet Status The status of the instance fleet.
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.
- Instance
Fleet Timeline Provides historical timestamps for the instance fleet, including the time of creation, the time it became ready to run jobs, and the time of termination.
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.
- Instance
Group This entity represents an instance group, which is a group of instances that have common purpose. For example, CORE instance group is used for HDFS.
- Instance
Group Config Configuration defining a new instance group.
- Instance
Group Detail Detailed information about an instance group.
- Instance
Group Modify Config Modify the size or configurations of an instance group.
- Instance
Group State Change Reason The status change reason details for the instance group.
- Instance
Group Status The details of the instance group status.
- Instance
Group Timeline The timeline of the instance group lifecycle.
- Instance
Resize Policy Custom policy for requesting termination protection or termination of specific instances when shrinking an instance group.
- Instance
State Change Reason The details of the status change reason for the instance.
- Instance
Status The instance status details.
- Instance
Timeline The timeline of the instance lifecycle.
- Instance
Type Config An instance type configuration for each instance type in an instance fleet, which determines the EC2 instances Amazon EMR attempts to provision to fulfill On-Demand and Spot target capacities. There can be a maximum of 5 instance type configurations in a fleet.
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.
- Instance
Type Specification The configuration specification for each instance type in an instance fleet.
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.
- JobFlow
Detail A description of a cluster (job flow).
- JobFlow
Execution Status Detail Describes the status of the cluster (job flow).
- JobFlow
Instances Config A description of the Amazon EC2 instance on which the cluster (job flow) runs. A valid JobFlowInstancesConfig must contain either InstanceGroups or InstanceFleets, which is the recommended configuration. They cannot be used together. You may also have MasterInstanceType, SlaveInstanceType, and InstanceCount (all three must be present), but we don't recommend this configuration.
- JobFlow
Instances Detail Specify the type of Amazon EC2 instances that the cluster (job flow) runs on.
- Kerberos
Attributes Attributes for Kerberos configuration when Kerberos authentication is enabled using a security configuration. For more information see Use Kerberos Authentication in the EMR Management Guide.
- KeyValue
A key value pair.
- List
Bootstrap Actions Input This input determines which bootstrap actions to retrieve.
- List
Bootstrap Actions Output This output contains the bootstrap actions detail.
- List
Clusters Input This input determines how the ListClusters action filters the list of clusters that it returns.
- List
Clusters Output This contains a ClusterSummaryList with the cluster details; for example, the cluster IDs, names, and status.
- List
Instance Fleets Input - List
Instance Fleets Output - List
Instance Groups Input This input determines which instance groups to retrieve.
- List
Instance Groups Output This input determines which instance groups to retrieve.
- List
Instances Input This input determines which instances to list.
- List
Instances Output This output contains the list of instances.
- List
Security Configurations Input - List
Security Configurations Output - List
Steps Input This input determines which steps to list.
- List
Steps Output This output contains the list of steps returned in reverse order. This means that the last step is the first element in the list.
- Managed
Scaling Policy Managed scaling policy for an Amazon EMR cluster. The policy specifies the limits for resources that can be added or terminated from a cluster. The policy only applies to the core and task nodes. The master node cannot be scaled after initial configuration.
- Metric
Dimension A CloudWatch dimension, which is specified using a
Key
(known as aName
in CloudWatch),Value
pair. By default, Amazon EMR uses one dimension whoseKey
isJobFlowID
andValue
is a variable representing the cluster ID, which is${emr.clusterId}
. This enables the rule to bootstrap when the cluster ID becomes available.- Modify
Cluster Input - Modify
Cluster Output - Modify
Instance Fleet Input - Modify
Instance Groups Input Change the size of some instance groups.
- OnDemand
Provisioning Specification The launch specification for On-Demand instances in the instance fleet, which determines the allocation strategy.
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions. On-Demand instances allocation strategy is available in Amazon EMR version 5.12.1 and later.
- Placement
Type The Amazon EC2 Availability Zone configuration of the cluster (job flow).
- Port
Range A list of port ranges that are permitted to allow inbound traffic from all public IP addresses. To specify a single port, use the same value for
MinRange
andMaxRange
.- PutAuto
Scaling Policy Input - PutAuto
Scaling Policy Output - PutBlock
Public Access Configuration Input - PutBlock
Public Access Configuration Output - PutManaged
Scaling Policy Input - PutManaged
Scaling Policy Output - Remove
Auto Scaling Policy Input - Remove
Auto Scaling Policy Output - Remove
Managed Scaling Policy Input - Remove
Managed Scaling Policy Output - Remove
Tags Input This input identifies a cluster and a list of tags to remove.
- Remove
Tags Output This output indicates the result of removing tags from a resource.
- RunJob
Flow Input Input to the RunJobFlow operation.
- RunJob
Flow Output The result of the RunJobFlow operation.
- Scaling
Action The type of adjustment the automatic scaling activity makes when triggered, and the periodicity of the adjustment.
- Scaling
Constraints The upper and lower EC2 instance limits for an automatic scaling policy. Automatic scaling activities triggered by automatic scaling rules will not cause an instance group to grow above or below these limits.
- Scaling
Rule A scale-in or scale-out rule that defines scaling activity, including the CloudWatch metric alarm that triggers activity, how EC2 instances are added or removed, and the periodicity of adjustments. The automatic scaling policy for an instance group can comprise one or more automatic scaling rules.
- Scaling
Trigger The conditions that trigger an automatic scaling activity.
- Script
Bootstrap Action Config Configuration of the script to run during a bootstrap action.
- Security
Configuration Summary The creation date and time, and name, of a security configuration.
- SetTermination
Protection Input The input argument to the TerminationProtection operation.
- SetVisible
ToAll Users Input The input to the SetVisibleToAllUsers action.
- Shrink
Policy Policy for customizing shrink operations. Allows configuration of decommissioning timeout and targeted instance shrinking.
- Simple
Scaling Policy Configuration An automatic scaling configuration, which describes how the policy adds or removes instances, the cooldown period, and the number of EC2 instances that will be added each time the CloudWatch metric alarm condition is satisfied.
- Spot
Provisioning Specification The launch specification for Spot instances in the instance fleet, which determines the defined duration, provisioning timeout behavior, and allocation strategy.
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions. Spot instance allocation strategy is available in Amazon EMR version 5.12.1 and later.
- Step
This represents a step in a cluster.
- Step
Config Specification of a cluster (job flow) step.
- Step
Detail Combines the execution state and configuration of a step.
- Step
Execution Status Detail The execution state of a step.
- Step
State Change Reason The details of the step state change reason.
- Step
Status The execution status details of the cluster step.
- Step
Summary The summary of the cluster step.
- Step
Timeline The timeline of the cluster step lifecycle.
- Supported
Product Config The list of supported product configurations which allow user-supplied arguments. EMR accepts these arguments and forwards them to the corresponding installation script as bootstrap action arguments.
- Tag
A key/value pair containing user-defined metadata that you can associate with an Amazon EMR resource. Tags make it easier to associate clusters in various ways, such as grouping clusters to track your Amazon EMR resource allocation costs. For more information, see Tag Clusters.
- Terminate
JobFlows Input Input to the TerminateJobFlows operation.
- Volume
Specification EBS volume specifications such as volume type, IOPS, and size (GiB) that will be requested for the EBS volume attached to an EC2 instance in the cluster.
Enums§
- AddInstance
Fleet Error - Errors returned by AddInstanceFleet
- AddInstance
Groups Error - Errors returned by AddInstanceGroups
- AddJob
Flow Steps Error - Errors returned by AddJobFlowSteps
- AddTags
Error - Errors returned by AddTags
- Cancel
Steps Error - Errors returned by CancelSteps
- Create
Security Configuration Error - Errors returned by CreateSecurityConfiguration
- Delete
Security Configuration Error - Errors returned by DeleteSecurityConfiguration
- Describe
Cluster Error - Errors returned by DescribeCluster
- Describe
JobFlows Error - Errors returned by DescribeJobFlows
- Describe
Security Configuration Error - Errors returned by DescribeSecurityConfiguration
- Describe
Step Error - Errors returned by DescribeStep
- GetBlock
Public Access Configuration Error - Errors returned by GetBlockPublicAccessConfiguration
- GetManaged
Scaling Policy Error - Errors returned by GetManagedScalingPolicy
- List
Bootstrap Actions Error - Errors returned by ListBootstrapActions
- List
Clusters Error - Errors returned by ListClusters
- List
Instance Fleets Error - Errors returned by ListInstanceFleets
- List
Instance Groups Error - Errors returned by ListInstanceGroups
- List
Instances Error - Errors returned by ListInstances
- List
Security Configurations Error - Errors returned by ListSecurityConfigurations
- List
Steps Error - Errors returned by ListSteps
- Modify
Cluster Error - Errors returned by ModifyCluster
- Modify
Instance Fleet Error - Errors returned by ModifyInstanceFleet
- Modify
Instance Groups Error - Errors returned by ModifyInstanceGroups
- PutAuto
Scaling Policy Error - Errors returned by PutAutoScalingPolicy
- PutBlock
Public Access Configuration Error - Errors returned by PutBlockPublicAccessConfiguration
- PutManaged
Scaling Policy Error - Errors returned by PutManagedScalingPolicy
- Remove
Auto Scaling Policy Error - Errors returned by RemoveAutoScalingPolicy
- Remove
Managed Scaling Policy Error - Errors returned by RemoveManagedScalingPolicy
- Remove
Tags Error - Errors returned by RemoveTags
- RunJob
Flow Error - Errors returned by RunJobFlow
- SetTermination
Protection Error - Errors returned by SetTerminationProtection
- SetVisible
ToAll Users Error - Errors returned by SetVisibleToAllUsers
- Terminate
JobFlows Error - Errors returned by TerminateJobFlows
Traits§
- Emr
- Trait representing the capabilities of the Amazon EMR API. Amazon EMR clients implement this trait.