Struct aws_sdk_sagemaker::types::TabularJobConfig
source · #[non_exhaustive]pub struct TabularJobConfig {
pub candidate_generation_config: Option<CandidateGenerationConfig>,
pub completion_criteria: Option<AutoMlJobCompletionCriteria>,
pub feature_specification_s3_uri: Option<String>,
pub mode: Option<AutoMlMode>,
pub generate_candidate_definitions_only: Option<bool>,
pub problem_type: Option<ProblemType>,
pub target_attribute_name: Option<String>,
pub sample_weight_attribute_name: Option<String>,
}
Expand description
The collection of settings used by an AutoML job V2 for the tabular problem type.
Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Struct { .. }
syntax; cannot be matched against without a wildcard ..
; and struct update syntax will not work.candidate_generation_config: Option<CandidateGenerationConfig>
The configuration information of how model candidates are generated.
completion_criteria: Option<AutoMlJobCompletionCriteria>
How long a job is allowed to run, or how many candidates a job is allowed to generate.
feature_specification_s3_uri: Option<String>
A URL to the Amazon S3 data source containing selected features from the input data source to run an Autopilot job V2. You can input FeatureAttributeNames
(optional) in JSON format as shown below:
{ "FeatureAttributeNames":["col1", "col2", ...] }
.
You can also specify the data type of the feature (optional) in the format shown below:
{ "FeatureDataTypes":{"col1":"numeric", "col2":"categorical" ... } }
These column keys may not include the target column.
In ensembling mode, Autopilot only supports the following data types: numeric
, categorical
, text
, and datetime
. In HPO mode, Autopilot can support numeric
, categorical
, text
, datetime
, and sequence
.
If only FeatureDataTypes
is provided, the column keys (col1
, col2
,..) should be a subset of the column names in the input data.
If both FeatureDataTypes
and FeatureAttributeNames
are provided, then the column keys should be a subset of the column names provided in FeatureAttributeNames
.
The key name FeatureAttributeNames
is fixed. The values listed in ["col1", "col2", ...]
are case sensitive and should be a list of strings containing unique values that are a subset of the column names in the input data. The list of columns provided must not include the target column.
mode: Option<AutoMlMode>
The method that Autopilot uses to train the data. You can either specify the mode manually or let Autopilot choose for you based on the dataset size by selecting AUTO
. In AUTO
mode, Autopilot chooses ENSEMBLING
for datasets smaller than 100 MB, and HYPERPARAMETER_TUNING
for larger ones.
The ENSEMBLING
mode uses a multi-stack ensemble model to predict classification and regression tasks directly from your dataset. This machine learning mode combines several base models to produce an optimal predictive model. It then uses a stacking ensemble method to combine predictions from contributing members. A multi-stack ensemble model can provide better performance over a single model by combining the predictive capabilities of multiple models. See Autopilot algorithm support for a list of algorithms supported by ENSEMBLING
mode.
The HYPERPARAMETER_TUNING
(HPO) mode uses the best hyperparameters to train the best version of a model. HPO automatically selects an algorithm for the type of problem you want to solve. Then HPO finds the best hyperparameters according to your objective metric. See Autopilot algorithm support for a list of algorithms supported by HYPERPARAMETER_TUNING
mode.
generate_candidate_definitions_only: Option<bool>
Generates possible candidates without training the models. A model candidate is a combination of data preprocessors, algorithms, and algorithm parameter settings.
problem_type: Option<ProblemType>
The type of supervised learning problem available for the model candidates of the AutoML job V2. For more information, see Amazon SageMaker Autopilot problem types.
You must either specify the type of supervised learning problem in ProblemType
and provide the AutoMLJobObjective metric, or none at all.
target_attribute_name: Option<String>
The name of the target variable in supervised learning, usually represented by 'y'.
sample_weight_attribute_name: Option<String>
If specified, this column name indicates which column of the dataset should be treated as sample weights for use by the objective metric during the training, evaluation, and the selection of the best model. This column is not considered as a predictive feature. For more information on Autopilot metrics, see Metrics and validation.
Sample weights should be numeric, non-negative, with larger values indicating which rows are more important than others. Data points that have invalid or no weight value are excluded.
Support for sample weights is available in Ensembling mode only.
Implementations§
source§impl TabularJobConfig
impl TabularJobConfig
sourcepub fn candidate_generation_config(&self) -> Option<&CandidateGenerationConfig>
pub fn candidate_generation_config(&self) -> Option<&CandidateGenerationConfig>
The configuration information of how model candidates are generated.
sourcepub fn completion_criteria(&self) -> Option<&AutoMlJobCompletionCriteria>
pub fn completion_criteria(&self) -> Option<&AutoMlJobCompletionCriteria>
How long a job is allowed to run, or how many candidates a job is allowed to generate.
sourcepub fn feature_specification_s3_uri(&self) -> Option<&str>
pub fn feature_specification_s3_uri(&self) -> Option<&str>
A URL to the Amazon S3 data source containing selected features from the input data source to run an Autopilot job V2. You can input FeatureAttributeNames
(optional) in JSON format as shown below:
{ "FeatureAttributeNames":["col1", "col2", ...] }
.
You can also specify the data type of the feature (optional) in the format shown below:
{ "FeatureDataTypes":{"col1":"numeric", "col2":"categorical" ... } }
These column keys may not include the target column.
In ensembling mode, Autopilot only supports the following data types: numeric
, categorical
, text
, and datetime
. In HPO mode, Autopilot can support numeric
, categorical
, text
, datetime
, and sequence
.
If only FeatureDataTypes
is provided, the column keys (col1
, col2
,..) should be a subset of the column names in the input data.
If both FeatureDataTypes
and FeatureAttributeNames
are provided, then the column keys should be a subset of the column names provided in FeatureAttributeNames
.
The key name FeatureAttributeNames
is fixed. The values listed in ["col1", "col2", ...]
are case sensitive and should be a list of strings containing unique values that are a subset of the column names in the input data. The list of columns provided must not include the target column.
sourcepub fn mode(&self) -> Option<&AutoMlMode>
pub fn mode(&self) -> Option<&AutoMlMode>
The method that Autopilot uses to train the data. You can either specify the mode manually or let Autopilot choose for you based on the dataset size by selecting AUTO
. In AUTO
mode, Autopilot chooses ENSEMBLING
for datasets smaller than 100 MB, and HYPERPARAMETER_TUNING
for larger ones.
The ENSEMBLING
mode uses a multi-stack ensemble model to predict classification and regression tasks directly from your dataset. This machine learning mode combines several base models to produce an optimal predictive model. It then uses a stacking ensemble method to combine predictions from contributing members. A multi-stack ensemble model can provide better performance over a single model by combining the predictive capabilities of multiple models. See Autopilot algorithm support for a list of algorithms supported by ENSEMBLING
mode.
The HYPERPARAMETER_TUNING
(HPO) mode uses the best hyperparameters to train the best version of a model. HPO automatically selects an algorithm for the type of problem you want to solve. Then HPO finds the best hyperparameters according to your objective metric. See Autopilot algorithm support for a list of algorithms supported by HYPERPARAMETER_TUNING
mode.
sourcepub fn generate_candidate_definitions_only(&self) -> Option<bool>
pub fn generate_candidate_definitions_only(&self) -> Option<bool>
Generates possible candidates without training the models. A model candidate is a combination of data preprocessors, algorithms, and algorithm parameter settings.
sourcepub fn problem_type(&self) -> Option<&ProblemType>
pub fn problem_type(&self) -> Option<&ProblemType>
The type of supervised learning problem available for the model candidates of the AutoML job V2. For more information, see Amazon SageMaker Autopilot problem types.
You must either specify the type of supervised learning problem in ProblemType
and provide the AutoMLJobObjective metric, or none at all.
sourcepub fn target_attribute_name(&self) -> Option<&str>
pub fn target_attribute_name(&self) -> Option<&str>
The name of the target variable in supervised learning, usually represented by 'y'.
sourcepub fn sample_weight_attribute_name(&self) -> Option<&str>
pub fn sample_weight_attribute_name(&self) -> Option<&str>
If specified, this column name indicates which column of the dataset should be treated as sample weights for use by the objective metric during the training, evaluation, and the selection of the best model. This column is not considered as a predictive feature. For more information on Autopilot metrics, see Metrics and validation.
Sample weights should be numeric, non-negative, with larger values indicating which rows are more important than others. Data points that have invalid or no weight value are excluded.
Support for sample weights is available in Ensembling mode only.
source§impl TabularJobConfig
impl TabularJobConfig
sourcepub fn builder() -> TabularJobConfigBuilder
pub fn builder() -> TabularJobConfigBuilder
Creates a new builder-style object to manufacture TabularJobConfig
.
Trait Implementations§
source§impl Clone for TabularJobConfig
impl Clone for TabularJobConfig
source§fn clone(&self) -> TabularJobConfig
fn clone(&self) -> TabularJobConfig
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl Debug for TabularJobConfig
impl Debug for TabularJobConfig
source§impl PartialEq for TabularJobConfig
impl PartialEq for TabularJobConfig
source§fn eq(&self, other: &TabularJobConfig) -> bool
fn eq(&self, other: &TabularJobConfig) -> bool
self
and other
values to be equal, and is used
by ==
.