logo
Expand description

AWS Data Pipeline configures and manages a data-driven workflow called a pipeline. AWS Data Pipeline handles the details of scheduling and ensuring that data dependencies are met so that your application can focus on processing the data.

AWS Data Pipeline provides a JAR implementation of a task runner called AWS Data Pipeline Task Runner. AWS Data Pipeline Task Runner provides logic for common data management scenarios, such as performing database queries and running data analysis using Amazon Elastic MapReduce (Amazon EMR). You can use AWS Data Pipeline Task Runner as your task runner, or you can write your own task runner to provide custom data management.

AWS Data Pipeline implements two main sets of functionality. Use the first set to create a pipeline and define data sources, schedules, dependencies, and the transforms to be performed on the data. Use the second set in your task runner application to receive the next task ready for processing. The logic for performing the task, such as querying the data, running data analysis, or converting the data from one format to another, is contained within the task runner. The task runner performs the task assigned to it by the web service, reporting progress to the web service as it does so. When the task is done, the task runner reports the final success or failure of the task to the web service.

If you’re using the service, you’re probably looking for DataPipelineClient and DataPipeline.

Structs

Contains the parameters for ActivatePipeline.

Contains the output of ActivatePipeline.

Contains the parameters for AddTags.

Contains the output of AddTags.

Contains the parameters for CreatePipeline.

Contains the output of CreatePipeline.

A client for the AWS Data Pipeline API.

Contains the parameters for DeactivatePipeline.

Contains the output of DeactivatePipeline.

Contains the parameters for DeletePipeline.

Contains the parameters for DescribeObjects.

Contains the output of DescribeObjects.

Contains the parameters for DescribePipelines.

Contains the output of DescribePipelines.

Contains the parameters for EvaluateExpression.

Contains the output of EvaluateExpression.

A key-value pair that describes a property of a pipeline object. The value is specified as either a string value (StringValue) or a reference to another object (RefValue) but not as both.

Contains the parameters for GetPipelineDefinition.

Contains the output of GetPipelineDefinition.

Identity information for the EC2 instance that is hosting the task runner. You can get this value by calling a metadata URI from the EC2 instance. For more information, see Instance Metadata in the Amazon Elastic Compute Cloud User Guide. Passing in this value proves that your task runner is running on an EC2 instance, and ensures the proper AWS Data Pipeline service charges are applied to your pipeline.

Contains the parameters for ListPipelines.

Contains the output of ListPipelines.

Contains a logical operation for comparing the value of a field with a specified value.

The attributes allowed or specified with a parameter object.

Contains information about a parameter object.

A value or list of parameter values.

Contains pipeline metadata.

Contains the name and identifier of a pipeline.

Contains information about a pipeline object. This can be a logical, physical, or physical attempt pipeline object. The complete set of components of a pipeline defines the pipeline.

Contains the parameters for PollForTask.

Contains the output of PollForTask.

Contains the parameters for PutPipelineDefinition.

Contains the output of PutPipelineDefinition.

Defines the query to run against an object.

Contains the parameters for QueryObjects.

Contains the output of QueryObjects.

Contains the parameters for RemoveTags.

Contains the output of RemoveTags.

Contains the parameters for ReportTaskProgress.

Contains the output of ReportTaskProgress.

Contains the parameters for ReportTaskRunnerHeartbeat.

Contains the output of ReportTaskRunnerHeartbeat.

A comparision that is used to determine whether a query should return this object.

Contains the parameters for SetStatus.

Contains the parameters for SetTaskStatus.

Contains the output of SetTaskStatus.

Tags are key/value pairs defined by a user and associated with a pipeline to control access. AWS Data Pipeline allows you to associate ten tags per pipeline. For more information, see Controlling User Access to Pipelines in the AWS Data Pipeline Developer Guide.

Contains information about a pipeline task that is assigned to a task runner.

Contains the parameters for ValidatePipelineDefinition.

Contains the output of ValidatePipelineDefinition.

Defines a validation error. Validation errors prevent pipeline activation. The set of validation errors that can be returned are defined by AWS Data Pipeline.

Defines a validation warning. Validation warnings do not prevent pipeline activation. The set of validation warnings that can be returned are defined by AWS Data Pipeline.

Enums

Errors returned by ActivatePipeline

Errors returned by AddTags

Errors returned by CreatePipeline

Errors returned by DeactivatePipeline

Errors returned by DeletePipeline

Errors returned by DescribeObjects

Errors returned by DescribePipelines

Errors returned by EvaluateExpression

Errors returned by GetPipelineDefinition

Errors returned by ListPipelines

Errors returned by PollForTask

Errors returned by PutPipelineDefinition

Errors returned by QueryObjects

Errors returned by RemoveTags

Errors returned by ReportTaskProgress

Errors returned by ReportTaskRunnerHeartbeat

Errors returned by SetStatus

Errors returned by SetTaskStatus

Errors returned by ValidatePipelineDefinition

Traits

Trait representing the capabilities of the AWS Data Pipeline API. AWS Data Pipeline clients implement this trait.