Struct aws_sdk_glue::types::S3ParquetSource

source ·
#[non_exhaustive]
pub struct S3ParquetSource { pub name: String, pub paths: Vec<String>, pub compression_type: Option<ParquetCompressionType>, pub exclusions: Option<Vec<String>>, pub group_size: Option<String>, pub group_files: Option<String>, pub recurse: Option<bool>, pub max_band: Option<i32>, pub max_files_in_band: Option<i32>, pub additional_options: Option<S3DirectSourceAdditionalOptions>, pub output_schemas: Option<Vec<GlueSchema>>, }
Expand description

Specifies an Apache Parquet data store stored in Amazon S3.

Fields (Non-exhaustive)§

This struct is marked as non-exhaustive
Non-exhaustive structs could have additional fields added in future. Therefore, non-exhaustive structs cannot be constructed in external crates using the traditional Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.
§name: String

The name of the data store.

§paths: Vec<String>

A list of the Amazon S3 paths to read from.

§compression_type: Option<ParquetCompressionType>

Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are "gzip" and "bzip").

§exclusions: Option<Vec<String>>

A string containing a JSON list of Unix-style glob patterns to exclude. For example, "\[\"**.pdf\"\]" excludes all PDF files.

§group_size: Option<String>

The target group size in bytes. The default is computed based on the input data size and the size of your cluster. When there are fewer than 50,000 input files, "groupFiles" must be set to "inPartition" for this to take effect.

§group_files: Option<String>

Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping with fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more than 50,000 files, set this parameter to "none".

§recurse: Option<bool>

If set to true, recursively reads files in all subdirectories under the specified paths.

§max_band: Option<i32>

This option controls the duration in milliseconds after which the s3 listing is likely to be consistent. Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this option. The default is 900000 milliseconds, or 15 minutes.

§max_files_in_band: Option<i32>

This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run.

§additional_options: Option<S3DirectSourceAdditionalOptions>

Specifies additional connection options.

§output_schemas: Option<Vec<GlueSchema>>

Specifies the data schema for the S3 Parquet source.

Implementations§

source§

impl S3ParquetSource

source

pub fn name(&self) -> &str

The name of the data store.

source

pub fn paths(&self) -> &[String]

A list of the Amazon S3 paths to read from.

source

pub fn compression_type(&self) -> Option<&ParquetCompressionType>

Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are "gzip" and "bzip").

source

pub fn exclusions(&self) -> &[String]

A string containing a JSON list of Unix-style glob patterns to exclude. For example, "\[\"**.pdf\"\]" excludes all PDF files.

If no value was sent for this field, a default will be set. If you want to determine if no value was sent, use .exclusions.is_none().

source

pub fn group_size(&self) -> Option<&str>

The target group size in bytes. The default is computed based on the input data size and the size of your cluster. When there are fewer than 50,000 input files, "groupFiles" must be set to "inPartition" for this to take effect.

source

pub fn group_files(&self) -> Option<&str>

Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping with fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more than 50,000 files, set this parameter to "none".

source

pub fn recurse(&self) -> Option<bool>

If set to true, recursively reads files in all subdirectories under the specified paths.

source

pub fn max_band(&self) -> Option<i32>

This option controls the duration in milliseconds after which the s3 listing is likely to be consistent. Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this option. The default is 900000 milliseconds, or 15 minutes.

source

pub fn max_files_in_band(&self) -> Option<i32>

This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run.

source

pub fn additional_options(&self) -> Option<&S3DirectSourceAdditionalOptions>

Specifies additional connection options.

source

pub fn output_schemas(&self) -> &[GlueSchema]

Specifies the data schema for the S3 Parquet source.

If no value was sent for this field, a default will be set. If you want to determine if no value was sent, use .output_schemas.is_none().

source§

impl S3ParquetSource

source

pub fn builder() -> S3ParquetSourceBuilder

Creates a new builder-style object to manufacture S3ParquetSource.

Trait Implementations§

source§

impl Clone for S3ParquetSource

source§

fn clone(&self) -> S3ParquetSource

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for S3ParquetSource

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl PartialEq for S3ParquetSource

source§

fn eq(&self, other: &S3ParquetSource) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl StructuralPartialEq for S3ParquetSource

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T> Instrument for T

source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoEither for T

source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

impl<Unshared, Shared> IntoShared<Shared> for Unshared
where Shared: FromUnshared<Unshared>,

source§

fn into_shared(self) -> Shared

Creates a shared type from an unshared type.
source§

impl<T> Same for T

§

type Output = T

Should always be Self
source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<T> WithSubscriber for T

source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more