logo
Expand description

Glue

Defines the public endpoint for the Glue service.

If you’re using the service, you’re probably looking for GlueClient and Glue.

Structs

Defines an action to be initiated by a trigger.

A list of errors that can occur when registering partition indexes for an existing table.

These errors give the details about why an index registration failed and provide a limited number of partitions in the response, so that you can fix the partitions at fault and try registering the index again. The most common set of errors that can occur are categorized as follows:

  • EncryptedPartitionError: The partitions are encrypted.

  • InvalidPartitionTypeDataError: The partition value doesn't match the data type for that partition column.

  • MissingPartitionValueError: The partitions are encrypted.

  • UnsupportedPartitionCharacterError: Characters inside the partition value are not supported. For example: U+0000 , U+0001, U+0002.

  • InternalError: Any error which does not belong to other error codes.

Records an error that occurred when attempting to stop a specified job run.

Records a successful request to stop a specified JobRun.

Contains information about a batch update partition error.

A structure that contains the values and structure used to update a partition.

Defines column statistics supported for bit sequence data values.

Defines column statistics supported for Boolean data columns.

Specifies a table definition in the Glue Data Catalog.

A structure containing migration status information.

Specifies an Glue Data Catalog target.

Classifiers are triggered during a crawl task. A classifier checks whether a given file is in a format it can handle. If it is, the classifier creates a schema in the form of a StructType object that matches that data format.

You can use the standard classifiers that Glue provides, or you can write your own classifiers to best categorize your data sources and specify the appropriate schemas to use for them. A classifier can be a grok classifier, an XML classifier, a JSON classifier, or a custom CSV classifier, as specified in one of the fields in the Classifier object.

Specifies how Amazon CloudWatch data should be encrypted.

Represents a directional edge in a directed acyclic graph (DAG).

Represents a node in a directed acyclic graph (DAG)

An argument or property of a node.

A column in a Table.

Encapsulates a column name that failed and the reason for failure.

A structure containing the column name and column importance score for a column.

Column importance helps you understand how columns contribute to your model, by identifying which columns in your records are more important than others.

Represents the generated column-level statistics for a table or partition.

Contains the individual types of column statistics data. Only one data object should be set and indicated by the Type attribute.

Encapsulates a ColumnStatistics object that failed and the reason for failure.

Defines a condition under which a trigger fires.

The confusion matrix shows you what your transform is predicting accurately and what types of errors it is making.

For more information, see Confusion matrix in Wikipedia.

Defines a connection to a data source.

A structure that is used to specify a connection to create or update.

The data structure used by the Data Catalog to encrypt the password as part of CreateConnection or UpdateConnection and store it in the ENCRYPTED_PASSWORD field in the connection properties. You can enable catalog encryption or only password encryption.

When a CreationConnection request arrives containing a password, the Data Catalog first encrypts the password using your KMS key. It then encrypts the whole connection object again if catalog encryption is also enabled.

This encryption requires that you set KMS key permissions to enable or restrict access on the password key according to your security requirements. For example, you might want only administrators to have decrypt permission on the password key.

Specifies the connections used by a job.

The details of a crawl in the workflow.

Specifies a crawler program that examines a data source and uses classifiers to try to determine its schema. If successful, the crawler records metadata concerning the data source in the Glue Data Catalog.

Metrics for a specified crawler.

The details of a Crawler node present in the workflow.

Specifies data stores to crawl.

Specifies a custom CSV classifier for CreateClassifier to create.

Specifies a grok classifier for CreateClassifier to create.

Specifies a JSON classifier for CreateClassifier to create.

Specifies an XML classifier for CreateClassifier to create.

A classifier for custom CSV content.

Contains configuration information for maintaining Data Catalog security.

The Lake Formation principal.

The Database object represents a logical grouping of tables that might reside in a Hive metastore or an RDBMS.

A structure that describes a target database for resource linking.

The structure used to create or update a database.

Defines column statistics supported for timestamp data columns.

Defines column statistics supported for fixed-point number data columns.

Contains a numeric value in decimal format.

A development endpoint where a developer can remotely debug extract, transform, and load (ETL) scripts.

Custom libraries to be loaded into a development endpoint.

Defines column statistics supported for floating-point number data columns.

Specifies an Amazon DynamoDB table to crawl.

An edge represents a directed connection between two Glue components that are part of the workflow the edge belongs to.

Specifies the encryption-at-rest configuration for the Data Catalog.

Specifies an encryption configuration.

Contains details about an error.

An object containing error details.

Evaluation metrics provide an estimate of the quality of your machine learning transform.

An execution property of a job.

Specifies configuration properties for an exporting labels task run.

The evaluation metrics for the find matches algorithm. The quality of your machine learning transform is measured by getting your transform to predict some matches and comparing the results to known matches from the same dataset. The quality metrics are based on a subset of your data, so they are not precise.

The parameters to configure the find matches transform.

Specifies configuration properties for a Find Matches task run.

Filters the connection definitions that are returned by the GetConnections API operation.

A client for the AWS Glue API.

A structure for returning a resource policy.

The database and table in the Glue Data Catalog that is used for input or output data.

A classifier that uses grok patterns.

Specifies configuration properties for an importing labels task run.

Specifies a JDBC data store to crawl.

Specifies a job definition.

Defines a point that a job can resume processing.

Specifies how job bookmark data should be encrypted.

Specifies code that runs when a job is run.

The details of a Job node present in the workflow.

Contains information about a job run.

Specifies information used to update an existing job definition. The previous job definition is completely overwritten by this information.

A classifier for JSON content.

A partition key pair consisting of a name and a type.

Specifies configuration properties for a labeling set generation task run.

Status and error information about the most recent crawl.

Specifies data lineage configuration settings for the crawler.

The location of resources.

Defines column statistics supported for integer data columns.

A structure for a machine learning transform.

The encryption-at-rest settings of the transform that apply to accessing user data.

Defines a mapping.

A structure containing metadata information for a schema version.

A structure containing a key value pair for metadata.

Specifies an Amazon DocumentDB or MongoDB data store to crawl.

A node represents an Glue component such as a trigger, or job, etc., that is part of a workflow.

Specifies configuration properties of a notification.

Specifies the sort order of a sorted column.

A structure containing other metadata for a schema version belonging to the same metadata key.

Represents a slice of table data.

Contains information about a partition error.

A structure for a partition index.

A descriptor for a partition index in a table.

The structure used to create and update a partition.

Contains a list of values defining partitions.

Specifies the physical requirements for a connection.

A job run that was used in the predicate of a conditional trigger that triggered this job run.

Defines the predicate of the trigger, which determines when it fires.

Permissions granted to a principal.

Defines a property predicate.

When crawling an Amazon S3 data source after the first crawl is complete, specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. For more information, see Incremental Crawls in Glue in the developer guide.

A wrapper structure that may contain the registry name and Amazon Resource Name (ARN).

A structure containing the details for a registry.

The URIs for function resources.

Specifies how Amazon Simple Storage Service (Amazon S3) data should be encrypted.

Specifies a data store in Amazon Simple Storage Service (Amazon S3).

A scheduling object using a cron statement to schedule an event.

A policy that specifies update and deletion behaviors for the crawler.

A key-value pair representing a column and data type that this transform can run against. The Schema parameter of the MLTransform may contain up to 100 of these structures.

The unique ID of the schema in the Glue schema registry.

An object that contains minimal details for a schema.

An object that references a schema stored in the Glue Schema Registry.

An object that contains the error details for an operation on a schema version.

An object containing the details about a schema version.

A structure containing the schema version information.

Specifies a security configuration.

Defines a non-overlapping region of a table's partitions, allowing multiple requests to be run in parallel.

Information about a serialization/deserialization program (SerDe) that serves as an extractor and loader.

Specifies skewed values in a table. Skewed values are those that occur with very high frequency.

Specifies a field to sort by and a sort order.

Describes the physical storage of table data.

Defines column statistics supported for character sequence data values.

Represents a collection of related data organized in columns and rows.

An error record for table operations.

A structure that describes a target table for resource linking.

A structure used to define a table.

Specifies a version of a table.

An error record for table-version operations.

The sampling parameters that are associated with the machine learning transform.

The criteria that are used to filter the task runs for the machine learning transform.

The configuration properties for the task run.

The sorting criteria that are used to sort the list of task runs for the machine learning transform.

The encryption-at-rest settings of the transform that apply to accessing user data. Machine learning transforms can access user data encrypted in Amazon S3 using KMS.

Additionally, imported labels and trained transforms can now be encrypted using a customer provided KMS key.

The criteria used to filter the machine learning transforms.

The algorithm-specific parameters that are associated with the machine learning transform.

The sorting criteria that are associated with the machine learning transform.

Information about a specific trigger.

The details of a Trigger node present in the workflow.

A structure used to provide information used to update a trigger. This object updates the previous trigger definition by overwriting it completely.

Specifies a custom CSV classifier to be updated.

Specifies a grok classifier to update when passed to UpdateClassifier.

Specifies a JSON classifier to be updated.

Specifies an XML classifier to be updated.

Represents the equivalent of a Hive user-defined function (UDF) definition.

A structure used to create or update a user-defined function.

A workflow represents a flow in which Glue components should be run to complete a logical task.

A workflow graph represents the complete workflow containing all the Glue components present in the workflow and all the directed connections between them.

A workflow run is an execution of a workflow providing all the runtime information.

Workflow run statistics provides statistics about the workflow run.

A classifier for XML content.

Enums

Errors returned by BatchCreatePartition

Errors returned by BatchDeleteConnection

Errors returned by BatchDeletePartition

Errors returned by BatchDeleteTable

Errors returned by BatchDeleteTableVersion

Errors returned by BatchGetCrawlers

Errors returned by BatchGetDevEndpoints

Errors returned by BatchGetJobs

Errors returned by BatchGetPartition

Errors returned by BatchGetTriggers

Errors returned by BatchGetWorkflows

Errors returned by BatchUpdatePartition

Errors returned by CancelMLTaskRun

Errors returned by CheckSchemaVersionValidity

Errors returned by CreateClassifier

Errors returned by CreateConnection

Errors returned by CreateCrawler

Errors returned by CreateDatabase

Errors returned by CreateDevEndpoint

Errors returned by CreateJob

Errors returned by CreateMLTransform

Errors returned by CreatePartition

Errors returned by CreatePartitionIndex

Errors returned by CreateRegistry

Errors returned by CreateSchema

Errors returned by CreateScript

Errors returned by CreateSecurityConfiguration

Errors returned by CreateTable

Errors returned by CreateTrigger

Errors returned by CreateUserDefinedFunction

Errors returned by CreateWorkflow

Errors returned by DeleteClassifier

Errors returned by DeleteColumnStatisticsForPartition

Errors returned by DeleteColumnStatisticsForTable

Errors returned by DeleteConnection

Errors returned by DeleteCrawler

Errors returned by DeleteDatabase

Errors returned by DeleteDevEndpoint

Errors returned by DeleteJob

Errors returned by DeleteMLTransform

Errors returned by DeletePartition

Errors returned by DeletePartitionIndex

Errors returned by DeleteRegistry

Errors returned by DeleteResourcePolicy

Errors returned by DeleteSchema

Errors returned by DeleteSchemaVersions

Errors returned by DeleteSecurityConfiguration

Errors returned by DeleteTable

Errors returned by DeleteTableVersion

Errors returned by DeleteTrigger

Errors returned by DeleteUserDefinedFunction

Errors returned by DeleteWorkflow

Errors returned by GetCatalogImportStatus

Errors returned by GetClassifier

Errors returned by GetClassifiers

Errors returned by GetColumnStatisticsForPartition

Errors returned by GetColumnStatisticsForTable

Errors returned by GetConnection

Errors returned by GetConnections

Errors returned by GetCrawler

Errors returned by GetCrawlerMetrics

Errors returned by GetCrawlers

Errors returned by GetDataCatalogEncryptionSettings

Errors returned by GetDatabase

Errors returned by GetDatabases

Errors returned by GetDataflowGraph

Errors returned by GetDevEndpoint

Errors returned by GetDevEndpoints

Errors returned by GetJobBookmark

Errors returned by GetJob

Errors returned by GetJobRun

Errors returned by GetJobRuns

Errors returned by GetJobs

Errors returned by GetMLTaskRun

Errors returned by GetMLTaskRuns

Errors returned by GetMLTransform

Errors returned by GetMLTransforms

Errors returned by GetMapping

Errors returned by GetPartition

Errors returned by GetPartitionIndexes

Errors returned by GetPartitions

Errors returned by GetPlan

Errors returned by GetRegistry

Errors returned by GetResourcePolicies

Errors returned by GetResourcePolicy

Errors returned by GetSchemaByDefinition

Errors returned by GetSchema

Errors returned by GetSchemaVersion

Errors returned by GetSchemaVersionsDiff

Errors returned by GetSecurityConfiguration

Errors returned by GetSecurityConfigurations

Errors returned by GetTable

Errors returned by GetTableVersion

Errors returned by GetTableVersions

Errors returned by GetTables

Errors returned by GetTags

Errors returned by GetTrigger

Errors returned by GetTriggers

Errors returned by GetUserDefinedFunction

Errors returned by GetUserDefinedFunctions

Errors returned by GetWorkflow

Errors returned by GetWorkflowRun

Errors returned by GetWorkflowRunProperties

Errors returned by GetWorkflowRuns

Errors returned by BatchStopJobRun

Errors returned by ImportCatalogToGlue

Errors returned by ListCrawlers

Errors returned by ListDevEndpoints

Errors returned by ListJobs

Errors returned by ListMLTransforms

Errors returned by ListRegistries

Errors returned by ListSchemaVersions

Errors returned by ListSchemas

Errors returned by ListTriggers

Errors returned by ListWorkflows

Errors returned by PutDataCatalogEncryptionSettings

Errors returned by PutResourcePolicy

Errors returned by PutSchemaVersionMetadata

Errors returned by PutWorkflowRunProperties

Errors returned by QuerySchemaVersionMetadata

Errors returned by RegisterSchemaVersion

Errors returned by RemoveSchemaVersionMetadata

Errors returned by ResetJobBookmark

Errors returned by ResumeWorkflowRun

Errors returned by SearchTables

Errors returned by StartCrawler

Errors returned by StartCrawlerSchedule

Errors returned by StartExportLabelsTaskRun

Errors returned by StartImportLabelsTaskRun

Errors returned by StartJobRun

Errors returned by StartMLEvaluationTaskRun

Errors returned by StartMLLabelingSetGenerationTaskRun

Errors returned by StartTrigger

Errors returned by StartWorkflowRun

Errors returned by StopCrawler

Errors returned by StopCrawlerSchedule

Errors returned by StopTrigger

Errors returned by StopWorkflowRun

Errors returned by TagResource

Errors returned by UntagResource

Errors returned by UpdateClassifier

Errors returned by UpdateColumnStatisticsForPartition

Errors returned by UpdateColumnStatisticsForTable

Errors returned by UpdateConnection

Errors returned by UpdateCrawler

Errors returned by UpdateCrawlerSchedule

Errors returned by UpdateDatabase

Errors returned by UpdateDevEndpoint

Errors returned by UpdateJob

Errors returned by UpdateMLTransform

Errors returned by UpdatePartition

Errors returned by UpdateRegistry

Errors returned by UpdateSchema

Errors returned by UpdateTable

Errors returned by UpdateTrigger

Errors returned by UpdateUserDefinedFunction

Errors returned by UpdateWorkflow

Traits

Trait representing the capabilities of the AWS Glue API. AWS Glue clients implement this trait.