Expand description
AWS Data Pipeline configures and manages a data-driven workflow called a pipeline. AWS Data Pipeline handles the details of scheduling and ensuring that data dependencies are met so that your application can focus on processing the data.
AWS Data Pipeline provides a JAR implementation of a task runner called AWS Data Pipeline Task Runner. AWS Data Pipeline Task Runner provides logic for common data management scenarios, such as performing database queries and running data analysis using Amazon Elastic MapReduce (Amazon EMR). You can use AWS Data Pipeline Task Runner as your task runner, or you can write your own task runner to provide custom data management.
AWS Data Pipeline implements two main sets of functionality. Use the first set to create a pipeline and define data sources, schedules, dependencies, and the transforms to be performed on the data. Use the second set in your task runner application to receive the next task ready for processing. The logic for performing the task, such as querying the data, running data analysis, or converting the data from one format to another, is contained within the task runner. The task runner performs the task assigned to it by the web service, reporting progress to the web service as it does so. When the task is done, the task runner reports the final success or failure of the task to the web service.
If you’re using the service, you’re probably looking for DataPipelineClient and DataPipeline.
Structs§
- Activate
Pipeline Input Contains the parameters for ActivatePipeline.
- Activate
Pipeline Output Contains the output of ActivatePipeline.
- AddTags
Input Contains the parameters for AddTags.
- AddTags
Output Contains the output of AddTags.
- Create
Pipeline Input Contains the parameters for CreatePipeline.
- Create
Pipeline Output Contains the output of CreatePipeline.
- Data
Pipeline Client - A client for the AWS Data Pipeline API.
- Deactivate
Pipeline Input Contains the parameters for DeactivatePipeline.
- Deactivate
Pipeline Output Contains the output of DeactivatePipeline.
- Delete
Pipeline Input Contains the parameters for DeletePipeline.
- Describe
Objects Input Contains the parameters for DescribeObjects.
- Describe
Objects Output Contains the output of DescribeObjects.
- Describe
Pipelines Input Contains the parameters for DescribePipelines.
- Describe
Pipelines Output Contains the output of DescribePipelines.
- Evaluate
Expression Input Contains the parameters for EvaluateExpression.
- Evaluate
Expression Output Contains the output of EvaluateExpression.
- Field
A key-value pair that describes a property of a pipeline object. The value is specified as either a string value (
StringValue
) or a reference to another object (RefValue
) but not as both.- GetPipeline
Definition Input Contains the parameters for GetPipelineDefinition.
- GetPipeline
Definition Output Contains the output of GetPipelineDefinition.
- Instance
Identity Identity information for the EC2 instance that is hosting the task runner. You can get this value by calling a metadata URI from the EC2 instance. For more information, see Instance Metadata in the Amazon Elastic Compute Cloud User Guide. Passing in this value proves that your task runner is running on an EC2 instance, and ensures the proper AWS Data Pipeline service charges are applied to your pipeline.
- List
Pipelines Input Contains the parameters for ListPipelines.
- List
Pipelines Output Contains the output of ListPipelines.
- Operator
Contains a logical operation for comparing the value of a field with a specified value.
- Parameter
Attribute The attributes allowed or specified with a parameter object.
- Parameter
Object Contains information about a parameter object.
- Parameter
Value A value or list of parameter values.
- Pipeline
Description Contains pipeline metadata.
- Pipeline
IdName Contains the name and identifier of a pipeline.
- Pipeline
Object Contains information about a pipeline object. This can be a logical, physical, or physical attempt pipeline object. The complete set of components of a pipeline defines the pipeline.
- Poll
ForTask Input Contains the parameters for PollForTask.
- Poll
ForTask Output Contains the output of PollForTask.
- PutPipeline
Definition Input Contains the parameters for PutPipelineDefinition.
- PutPipeline
Definition Output Contains the output of PutPipelineDefinition.
- Query
Defines the query to run against an object.
- Query
Objects Input Contains the parameters for QueryObjects.
- Query
Objects Output Contains the output of QueryObjects.
- Remove
Tags Input Contains the parameters for RemoveTags.
- Remove
Tags Output Contains the output of RemoveTags.
- Report
Task Progress Input Contains the parameters for ReportTaskProgress.
- Report
Task Progress Output Contains the output of ReportTaskProgress.
- Report
Task Runner Heartbeat Input Contains the parameters for ReportTaskRunnerHeartbeat.
- Report
Task Runner Heartbeat Output Contains the output of ReportTaskRunnerHeartbeat.
- Selector
A comparision that is used to determine whether a query should return this object.
- SetStatus
Input Contains the parameters for SetStatus.
- SetTask
Status Input Contains the parameters for SetTaskStatus.
- SetTask
Status Output Contains the output of SetTaskStatus.
- Tag
Tags are key/value pairs defined by a user and associated with a pipeline to control access. AWS Data Pipeline allows you to associate ten tags per pipeline. For more information, see Controlling User Access to Pipelines in the AWS Data Pipeline Developer Guide.
- Task
Object Contains information about a pipeline task that is assigned to a task runner.
- Validate
Pipeline Definition Input Contains the parameters for ValidatePipelineDefinition.
- Validate
Pipeline Definition Output Contains the output of ValidatePipelineDefinition.
- Validation
Error Defines a validation error. Validation errors prevent pipeline activation. The set of validation errors that can be returned are defined by AWS Data Pipeline.
- Validation
Warning Defines a validation warning. Validation warnings do not prevent pipeline activation. The set of validation warnings that can be returned are defined by AWS Data Pipeline.
Enums§
- Activate
Pipeline Error - Errors returned by ActivatePipeline
- AddTags
Error - Errors returned by AddTags
- Create
Pipeline Error - Errors returned by CreatePipeline
- Deactivate
Pipeline Error - Errors returned by DeactivatePipeline
- Delete
Pipeline Error - Errors returned by DeletePipeline
- Describe
Objects Error - Errors returned by DescribeObjects
- Describe
Pipelines Error - Errors returned by DescribePipelines
- Evaluate
Expression Error - Errors returned by EvaluateExpression
- GetPipeline
Definition Error - Errors returned by GetPipelineDefinition
- List
Pipelines Error - Errors returned by ListPipelines
- Poll
ForTask Error - Errors returned by PollForTask
- PutPipeline
Definition Error - Errors returned by PutPipelineDefinition
- Query
Objects Error - Errors returned by QueryObjects
- Remove
Tags Error - Errors returned by RemoveTags
- Report
Task Progress Error - Errors returned by ReportTaskProgress
- Report
Task Runner Heartbeat Error - Errors returned by ReportTaskRunnerHeartbeat
- SetStatus
Error - Errors returned by SetStatus
- SetTask
Status Error - Errors returned by SetTaskStatus
- Validate
Pipeline Definition Error - Errors returned by ValidatePipelineDefinition
Traits§
- Data
Pipeline - Trait representing the capabilities of the AWS Data Pipeline API. AWS Data Pipeline clients implement this trait.