pub struct DTrees { /* private fields */ }
Expand description
The class represents a single decision tree or a collection of decision trees.
The current public interface of the class allows user to train only a single decision tree, however
the class is capable of storing multiple decision trees and using them for prediction (by summing
responses or using a voting schemes), and the derived from DTrees classes (such as RTrees and Boost)
use this capability to implement decision tree ensembles.
[ml_intro_trees]
Creates the empty model
The static method creates empty decision tree with the specified parameters. It should be then
trained using train method (see StatModel::train). Alternatively, you can load the model from
file using Algorithm::load<DTrees>(filename).
Loads and creates a serialized DTrees from a file
Use DTree::save to serialize and store an DTree to disk.
Load the DTree from this file again, by calling this function with the path to the file.
Optionally specify the node for the file containing the classifier
- filepath: path to serialized DTree
- nodeName: name of node containing the classifier
Loads and creates a serialized DTrees from a file
Use DTree::save to serialize and store an DTree to disk.
Load the DTree from this file again, by calling this function with the path to the file.
Optionally specify the node for the file containing the classifier
- filepath: path to serialized DTree
- nodeName: name of node containing the classifier
This alternative version of [load] function uses the following default values for its arguments:
Clears the algorithm state
Reads algorithm parameters from a file storage
Stores algorithm parameters in a file storage
Stores algorithm parameters in a file storage
Read more
Deprecated: ## Note
This alternative version of [write_with_name] function uses the following default values for its arguments:
Read more
Returns true if the Algorithm is empty (e.g. in the very beginning or after unsuccessful read
Saves the algorithm to a file.
In order to make this method work, the derived class must implement Algorithm::write(FileStorage& fs).
Returns the algorithm string identifier.
This string is used as top level xml/yml node tag when the object is saved to a file or string.
Return an the underlying raw pointer while consuming this wrapper.
Read more
Return the underlying mutable raw pointer
Read more
Cluster possible values of a categorical variable into K<=maxCategories clusters to
find a suboptimal split.
If a discrete variable, on which the training procedure tries to make a split, takes more than
maxCategories values, the precise best subset estimation may take a very long time because the
algorithm is exponential. Instead, many decision trees engines (including our implementation)
try to find sub-optimal split in this case by clustering all the samples into maxCategories
clusters that is some categories are merged together. The clustering is applied only in n >
2-class classification problems for categorical variables with N > max_categories possible
values. In case of regression and 2-class classification the optimal split can be found
efficiently without employing clustering, thus the parameter is not used in these cases.
Default value is 10.
Read more
The maximum possible depth of the tree.
That is the training algorithms attempts to split a node while its depth is less than maxDepth.
The root node has zero depth. The actual depth may be smaller if the other termination criteria
are met (see the outline of the training procedure [ml_intro_trees] “here”), and/or if the
tree is pruned. Default value is INT_MAX.
Read more
If the number of samples in a node is less than this parameter then the node will not be split.
Read more
If CVFolds > 1 then algorithms prunes the built decision tree using K-fold
cross-validation procedure where K is equal to CVFolds.
Default value is 10.
Read more
If true then surrogate splits will be built.
These splits allow to work with missing data and compute variable importance correctly.
Default value is false.
Read more
If true then a pruning will be harsher.
This will make a tree more compact and more resistant to the training data noise but a bit less
accurate. Default value is true.
Read more
If true then pruned branches are physically removed from the tree.
Otherwise they are retained and it is possible to get results from the original unpruned (or
pruned less aggressively) tree. Default value is true.
Read more
Termination criteria for regression trees.
If all absolute differences between an estimated value in a node and values of train samples
in this node are less than this parameter then the node will not be split further. Default
value is 0.01f
Read more
The array of a priori class probabilities, sorted by the class label value.
Read more
Cluster possible values of a categorical variable into K<=maxCategories clusters to
find a suboptimal split.
If a discrete variable, on which the training procedure tries to make a split, takes more than
maxCategories values, the precise best subset estimation may take a very long time because the
algorithm is exponential. Instead, many decision trees engines (including our implementation)
try to find sub-optimal split in this case by clustering all the samples into maxCategories
clusters that is some categories are merged together. The clustering is applied only in n >
2-class classification problems for categorical variables with N > max_categories possible
values. In case of regression and 2-class classification the optimal split can be found
efficiently without employing clustering, thus the parameter is not used in these cases.
Default value is 10.
Read more
The maximum possible depth of the tree.
That is the training algorithms attempts to split a node while its depth is less than maxDepth.
The root node has zero depth. The actual depth may be smaller if the other termination criteria
are met (see the outline of the training procedure [ml_intro_trees] “here”), and/or if the
tree is pruned. Default value is INT_MAX.
Read more
If the number of samples in a node is less than this parameter then the node will not be split.
Read more
If CVFolds > 1 then algorithms prunes the built decision tree using K-fold
cross-validation procedure where K is equal to CVFolds.
Default value is 10.
Read more
If true then surrogate splits will be built.
These splits allow to work with missing data and compute variable importance correctly.
Default value is false.
Read more
If true then a pruning will be harsher.
This will make a tree more compact and more resistant to the training data noise but a bit less
accurate. Default value is true.
Read more
If true then pruned branches are physically removed from the tree.
Otherwise they are retained and it is possible to get results from the original unpruned (or
pruned less aggressively) tree. Default value is true.
Read more
Termination criteria for regression trees.
If all absolute differences between an estimated value in a node and values of train samples
in this node are less than this parameter then the node will not be split further. Default
value is 0.01f
Read more
The array of a priori class probabilities, sorted by the class label value.
Read more
Returns indices of root nodes
Returns all the bitsets for categorical splits
Read more
Formats the value using the given formatter.
Read more
Executes the destructor for this type.
Read more
Converts to this type from the input type.
Converts to this type from the input type.
Converts to this type from the input type.
Converts to this type from the input type.
Returns the number of variables in training samples
Returns true if the model is trained
Returns true if the model is classifier
Computes error on the training or test dataset
Read more
Predicts response(s) for the provided sample(s)
Read more
Predicts response(s) for the provided sample(s)
Read more
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.
Immutably borrows from an owned value.
Read more
Mutably borrows from an owned value.
Read more
Returns the argument unchanged.
Calls U::from(self)
.
That is, this conversion is whatever the implementation of
From<T> for U
chooses to do.
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.