Struct forust_ml::gradientbooster::GradientBooster
source · pub struct GradientBooster {Show 28 fields
pub objective_type: ObjectiveType,
pub iterations: usize,
pub learning_rate: f32,
pub max_depth: usize,
pub max_leaves: usize,
pub l2: f32,
pub gamma: f32,
pub min_leaf_weight: f32,
pub base_score: f64,
pub nbins: u16,
pub parallel: bool,
pub allow_missing_splits: bool,
pub monotone_constraints: Option<ConstraintMap>,
pub subsample: f32,
pub top_rate: f64,
pub other_rate: f64,
pub seed: u64,
pub missing: f64,
pub create_missing_branch: bool,
pub sample_method: SampleMethod,
pub grow_policy: GrowPolicy,
pub evaluation_metric: Option<Metric>,
pub early_stopping_rounds: Option<usize>,
pub initialize_base_score: bool,
pub evaluation_history: Option<RowMajorMatrix<f64>>,
pub best_iteration: Option<usize>,
pub prediction_iteration: Option<usize>,
pub trees: Vec<Tree>,
/* private fields */
}Expand description
Gradient Booster object
objective_type- The name of objective function used to optimize. Valid options include “LogLoss” to use logistic loss as the objective function, or “SquaredLoss” to use Squared Error as the objective function.iterations- Total number of trees to train in the ensemble.learning_rate- Step size to use at each iteration. Each leaf weight is multiplied by this number. The smaller the value, the more conservative the weights will be.max_depth- Maximum depth of an individual tree. Valid values are 0 to infinity.max_leaves- Maximum number of leaves allowed on a tree. Valid values are 0 to infinity. This is the total number of final nodes.l2- L2 regularization term applied to the weights of the tree. Valid values are 0 to infinity.gamma- The minimum amount of loss required to further split a node. Valid values are 0 to infinity.min_leaf_weight- Minimum sum of the hessian values of the loss function required to be in a node.base_score- The initial prediction value of the model.nbins- Number of bins to calculate to partition the data. Setting this to a smaller number, will result in faster training time, while potentially sacrificing accuracy. If there are more bins, than unique values in a column, all unique values will be used.allow_missing_splits- Should the algorithm allow splits that completed seperate out missing and non-missing values, in the case wherecreate_missing_branchis false. Whencreate_missing_branchis true, setting this to true will result in the missin branch being further split.monotone_constraints- Constraints that are used to enforce a specific relationship between the training features and the target variable.subsample- Percent of records to randomly sample at each iteration when training a tree.top_rate- Used only in goss. The retain ratio of large gradient data.other_rate- Used only in goss. the retain ratio of small gradient data.seed- Integer value used to seed any randomness used in the algorithm.missing- Value to consider missing.create_missing_branch- Should missing be split out it’s own separate branch?sample_method- Specify the method that records should be sampled when training?evaluation_metric- Define the evaluation metric to record at each iterations.early_stopping_rounds- Number of rounds where the evaluation metric value must improve in to keep training.initialize_base_score- If this is specified, the base_score will be calculated using the sample_weight and y data in accordance with the requested objective_type.
Fields§
§objective_type: ObjectiveType§iterations: usize§learning_rate: f32§max_depth: usize§max_leaves: usize§l2: f32§gamma: f32§min_leaf_weight: f32§base_score: f64§nbins: u16§parallel: bool§allow_missing_splits: bool§monotone_constraints: Option<ConstraintMap>§subsample: f32§top_rate: f64§other_rate: f64§seed: u64§missing: f64§create_missing_branch: bool§sample_method: SampleMethod§grow_policy: GrowPolicy§evaluation_metric: Option<Metric>§early_stopping_rounds: Option<usize>§initialize_base_score: bool§evaluation_history: Option<RowMajorMatrix<f64>>§best_iteration: Option<usize>§prediction_iteration: Option<usize>number of trees to use when predicting, defaults to best_iteration if this is defined.
trees: Vec<Tree>Implementations§
source§impl GradientBooster
impl GradientBooster
sourcepub fn new(
objective_type: ObjectiveType,
iterations: usize,
learning_rate: f32,
max_depth: usize,
max_leaves: usize,
l2: f32,
gamma: f32,
min_leaf_weight: f32,
base_score: Option<f64>,
nbins: u16,
parallel: bool,
allow_missing_splits: bool,
monotone_constraints: Option<ConstraintMap>,
subsample: f32,
top_rate: f64,
other_rate: f64,
seed: u64,
missing: f64,
create_missing_branch: bool,
sample_method: SampleMethod,
grow_policy: GrowPolicy,
evaluation_metric: Option<Metric>,
early_stopping_rounds: Option<usize>,
initialize_base_score: bool
) -> Result<Self, ForustError>
pub fn new( objective_type: ObjectiveType, iterations: usize, learning_rate: f32, max_depth: usize, max_leaves: usize, l2: f32, gamma: f32, min_leaf_weight: f32, base_score: Option<f64>, nbins: u16, parallel: bool, allow_missing_splits: bool, monotone_constraints: Option<ConstraintMap>, subsample: f32, top_rate: f64, other_rate: f64, seed: u64, missing: f64, create_missing_branch: bool, sample_method: SampleMethod, grow_policy: GrowPolicy, evaluation_metric: Option<Metric>, early_stopping_rounds: Option<usize>, initialize_base_score: bool ) -> Result<Self, ForustError>
Gradient Booster object
objective_type- The name of objective function used to optimize. Valid options include “LogLoss” to use logistic loss as the objective function, or “SquaredLoss” to use Squared Error as the objective function.iterations- Total number of trees to train in the ensemble.learning_rate- Step size to use at each iteration. Each leaf weight is multiplied by this number. The smaller the value, the more conservative the weights will be.max_depth- Maximum depth of an individual tree. Valid values are 0 to infinity.max_leaves- Maximum number of leaves allowed on a tree. Valid values are 0 to infinity. This is the total number of final nodes.l2- L2 regularization term applied to the weights of the tree. Valid values are 0 to infinity.gamma- The minimum amount of loss required to further split a node. Valid values are 0 to infinity.min_leaf_weight- Minimum sum of the hessian values of the loss function required to be in a node.base_score- The initial prediction value of the model. If set to None the parameterinitialize_base_scorewill automatically be set totrue, in which case the base score will be chosen based on the objective function at fit time.nbins- Number of bins to calculate to partition the data. Setting this to a smaller number, will result in faster training time, while potentially sacrificing accuracy. If there are more bins, than unique values in a column, all unique values will be used.parallel- Should the algorithm be run in parallel?allow_missing_splits- Should the algorithm allow splits that completed seperate out missing and non-missing values, in the case wherecreate_missing_branchis false. Whencreate_missing_branchis true, setting this to true will result in the missin branch being further split.monotone_constraints- Constraints that are used to enforce a specific relationship between the training features and the target variable.subsample- Percent of records to randomly sample at each iteration when training a tree.top_rate- Used only in goss. The retain ratio of large gradient data.other_rate- Used only in goss. the retain ratio of small gradient data.seed- Integer value used to seed any randomness used in the algorithm.missing- Value to consider missing.create_missing_branch- Should missing be split out it’s own separate branch?sample_method- Specify the method that records should be sampled when training?evaluation_metric- Define the evaluation metric to record at each iterations.early_stopping_rounds- Number of rounds that mustinitialize_base_score- If this is specified, the base_score will be calculated using the sample_weight and y data in accordance with the requested objective_type.
sourcepub fn fit(
&mut self,
data: &Matrix<'_, f64>,
y: &[f64],
sample_weight: &[f64],
evaluation_data: Option<Vec<EvaluationData<'_>>>
) -> Result<(), ForustError>
pub fn fit( &mut self, data: &Matrix<'_, f64>, y: &[f64], sample_weight: &[f64], evaluation_data: Option<Vec<EvaluationData<'_>>> ) -> Result<(), ForustError>
Fit the gradient booster on a provided dataset.
data- Either a pandas DataFrame, or a 2 dimensional numpy array.y- Either a pandas Series, or a 1 dimensional numpy array.sample_weight- Instance weights to use when training the model. If None is passed, a weight of 1 will be used for every record.
sourcepub fn fit_unweighted(
&mut self,
data: &Matrix<'_, f64>,
y: &[f64],
evaluation_data: Option<Vec<EvaluationData<'_>>>
) -> Result<(), ForustError>
pub fn fit_unweighted( &mut self, data: &Matrix<'_, f64>, y: &[f64], evaluation_data: Option<Vec<EvaluationData<'_>>> ) -> Result<(), ForustError>
Fit the gradient booster on a provided dataset without any weights.
data- Either a pandas DataFrame, or a 2 dimensional numpy array.y- Either a pandas Series, or a 1 dimensional numpy array.
sourcepub fn predict(&self, data: &Matrix<'_, f64>, parallel: bool) -> Vec<f64>
pub fn predict(&self, data: &Matrix<'_, f64>, parallel: bool) -> Vec<f64>
Generate predictions on data using the gradient booster.
data- Either a pandas DataFrame, or a 2 dimensional numpy array.
pub fn predict_contributions( &self, data: &Matrix<'_, f64>, method: ContributionsMethod, parallel: bool ) -> Vec<f64>
sourcepub fn value_partial_dependence(&self, feature: usize, value: f64) -> f64
pub fn value_partial_dependence(&self, feature: usize, value: f64) -> f64
Given a value, return the partial dependence value of that value for that feature in the model.
feature- The index of the feature.value- The value for which to calculate the partial dependence.
sourcepub fn save_booster(&self, path: &str) -> Result<(), ForustError>
pub fn save_booster(&self, path: &str) -> Result<(), ForustError>
Save a booster as a json object to a file.
path- Path to save booster.
sourcepub fn json_dump(&self) -> Result<String, ForustError>
pub fn json_dump(&self) -> Result<String, ForustError>
Dump a booster as a json object
sourcepub fn from_json(json_str: &str) -> Result<Self, ForustError>
pub fn from_json(json_str: &str) -> Result<Self, ForustError>
Load a booster from Json string
json_str- String object, which can be serialized to json.
sourcepub fn load_booster(path: &str) -> Result<Self, ForustError>
pub fn load_booster(path: &str) -> Result<Self, ForustError>
Load a booster from a path to a json booster object.
path- Path to load booster from.
sourcepub fn set_objective_type(self, objective_type: ObjectiveType) -> Self
pub fn set_objective_type(self, objective_type: ObjectiveType) -> Self
Set the objective_type on the booster.
objective_type- The objective type of the booster.
sourcepub fn set_iterations(self, iterations: usize) -> Self
pub fn set_iterations(self, iterations: usize) -> Self
Set the iterations on the booster.
iterations- The number of iterations of the booster.
sourcepub fn set_learning_rate(self, learning_rate: f32) -> Self
pub fn set_learning_rate(self, learning_rate: f32) -> Self
Set the learning_rate on the booster.
learning_rate- The learning rate of the booster.
sourcepub fn set_max_depth(self, max_depth: usize) -> Self
pub fn set_max_depth(self, max_depth: usize) -> Self
Set the max_depth on the booster.
max_depth- The maximum tree depth of the booster.
sourcepub fn set_max_leaves(self, max_leaves: usize) -> Self
pub fn set_max_leaves(self, max_leaves: usize) -> Self
Set the max_leaves on the booster.
max_leaves- The maximum number of leaves of the booster.
sourcepub fn set_nbins(self, nbins: u16) -> Self
pub fn set_nbins(self, nbins: u16) -> Self
Set the number of nbins on the booster.
max_leaves- Number of bins to calculate to partition the data. Setting this to a smaller number, will result in faster training time, while potentially sacrificing accuracy. If there are more bins, than unique values in a column, all unique values will be used.
sourcepub fn set_l2(self, l2: f32) -> Self
pub fn set_l2(self, l2: f32) -> Self
Set the l2 on the booster.
l2- The l2 regulation term of the booster.
sourcepub fn set_gamma(self, gamma: f32) -> Self
pub fn set_gamma(self, gamma: f32) -> Self
Set the gamma on the booster.
gamma- The gamma value of the booster.
sourcepub fn set_min_leaf_weight(self, min_leaf_weight: f32) -> Self
pub fn set_min_leaf_weight(self, min_leaf_weight: f32) -> Self
Set the min_leaf_weight on the booster.
min_leaf_weight- The minimum sum of the hession values allowed in the node of a tree of the booster.
sourcepub fn set_base_score(self, base_score: f64) -> Self
pub fn set_base_score(self, base_score: f64) -> Self
Set the base_score on the booster.
base_score- The base score of the booster.
sourcepub fn set_initialize_base_score(self, initialize_base_score: bool) -> Self
pub fn set_initialize_base_score(self, initialize_base_score: bool) -> Self
Set the base_score on the booster.
base_score- The base score of the booster.
sourcepub fn set_parallel(self, parallel: bool) -> Self
pub fn set_parallel(self, parallel: bool) -> Self
Set the parallel on the booster.
parallel- Set if the booster should be trained in parallels.
sourcepub fn set_allow_missing_splits(self, allow_missing_splits: bool) -> Self
pub fn set_allow_missing_splits(self, allow_missing_splits: bool) -> Self
Set the allow_missing_splits on the booster.
allow_missing_splits- Set if missing splits are allowed for the booster.
sourcepub fn set_monotone_constraints(
self,
monotone_constraints: Option<ConstraintMap>
) -> Self
pub fn set_monotone_constraints( self, monotone_constraints: Option<ConstraintMap> ) -> Self
Set the monotone_constraints on the booster.
monotone_constraints- The monotone constraints of the booster.
sourcepub fn set_subsample(self, subsample: f32) -> Self
pub fn set_subsample(self, subsample: f32) -> Self
Set the subsample on the booster.
subsample- Percent of the data to randomly sample when training each tree.
sourcepub fn set_seed(self, seed: u64) -> Self
pub fn set_seed(self, seed: u64) -> Self
Set the seed on the booster.
seed- Integer value used to see any randomness used in the algorithm.
sourcepub fn set_missing(self, missing: f64) -> Self
pub fn set_missing(self, missing: f64) -> Self
Set missing value of the booster
missing- Float value to consider as missing.
sourcepub fn set_create_missing_branch(self, create_missing_branch: bool) -> Self
pub fn set_create_missing_branch(self, create_missing_branch: bool) -> Self
Set create missing value of the booster
create_missing_branch- Bool specifying if missing should get it’s own branch.
sourcepub fn set_sample_method(self, sample_method: SampleMethod) -> Self
pub fn set_sample_method(self, sample_method: SampleMethod) -> Self
Set sample method on the booster.
sample_method- Sample method.
sourcepub fn set_evaluation_metric(self, evaluation_metric: Option<Metric>) -> Self
pub fn set_evaluation_metric(self, evaluation_metric: Option<Metric>) -> Self
Set sample method on the booster.
evaluation_metric- Sample method.
sourcepub fn set_early_stopping_rounds(
self,
early_stopping_rounds: Option<usize>
) -> Self
pub fn set_early_stopping_rounds( self, early_stopping_rounds: Option<usize> ) -> Self
Set early stopping rounds.
early_stopping_rounds- Early stoppings rounds.
sourcepub fn set_prediction_iteration(
self,
prediction_iteration: Option<usize>
) -> Self
pub fn set_prediction_iteration( self, prediction_iteration: Option<usize> ) -> Self
Set prediction iterations.
early_stopping_rounds- Early stoppings rounds.
sourcepub fn insert_metadata(&mut self, key: String, value: String)
pub fn insert_metadata(&mut self, key: String, value: String)
Insert metadata
key- String value for the metadata key.value- value to assign to the metadata key.
sourcepub fn get_metadata(&self, key: &String) -> Option<String>
pub fn get_metadata(&self, key: &String) -> Option<String>
Get Metadata
key- Get the associated value for the metadata key.