`RandomForestRegressor`¶

Random forest regression.

Parent type: Regressor

Parameters:

Name	Type	Description	Default
`treeCount`	`Int`	The number of trees to be used in the random forest. Has to be greater than 0.	`100`
`maxDepth`	`Int?`	The maximum depth of each tree. If null, the depth is not limited. Has to be greater than 0.	`null`
`minSampleCountInLeaves`	`Int`	The minimum number of samples that must remain in the leaves of each tree. Has to be greater than 0.	`1`

Examples:

pipeline example {
    val training = Table.fromCsvFile("training.csv").toTabularDataset("target");
    val test = Table.fromCsvFile("test.csv").toTabularDataset("target");
    val regressor = RandomForestRegressor(treeCount = 10).fit(training);
    val meanSquaredError = regressor.meanSquaredError(test);
}

Stub code in RandomForestRegressor.sdsstub

class RandomForestRegressor(
    @PythonName("tree_count") const treeCount: Int = 100,
    @PythonName("max_depth") maxDepth: Int? = null,
    @PythonName("min_sample_count_in_leaves") const minSampleCountInLeaves: Int = 1,
) sub Regressor where {
    treeCount > 0,
    minSampleCountInLeaves > 0,
} {
    /**
     * The number of trees used in the random forest.
     */
    @PythonName("tree_count") attr treeCount: Int
    /**
     * The maximum depth of each tree.
     */
    @PythonName("max_depth") attr maxDepth: Int?
    /**
     * The minimum number of samples that must remain in the leaves of each tree.
     */
    @PythonName("min_sample_count_in_leaves") attr minSampleCountInLeaves: Int

    /**
     * Create a copy of this regressor and fit it with the given training data.
     *
     * This regressor is not modified.
     *
     * @param trainingSet The training data containing the feature and target vectors.
     *
     * @result fittedRegressor The fitted regressor.
     */
    @Pure
    @Category(DataScienceCategory.ModelingQClassicalRegression)
    fun fit(
        @PythonName("training_set") trainingSet: TabularDataset
    ) -> fittedRegressor: RandomForestRegressor
}

`isFitted`¶

Whether the model is fitted.

Type: Boolean

`maxDepth`¶

The maximum depth of each tree.

Type: Int?

`minSampleCountInLeaves`¶

The minimum number of samples that must remain in the leaves of each tree.

Type: Int

`treeCount`¶

The number of trees used in the random forest.

Type: Int

`coefficientOfDetermination`¶

Compute the coefficient of determination (R²) of the regressor on the given data.

The coefficient of determination compares the regressor's predictions to another model that always predicts the mean of the target values. It is a measure of how well the regressor explains the variance in the target values.

The higher the coefficient of determination, the better the regressor. Results range from negative infinity to 1.0. You can interpret the coefficient of determination as follows:

R²	Interpretation
1.0	The model perfectly predicts the target values. Did you overfit?
(0.0, 1.0)	The model is better than predicting the mean of the target values. You should be here.
0.0	The model is as good as predicting the mean of the target values. Try something else.
(-∞, 0.0)	The model is worse than predicting the mean of the target values. Something is very wrong.

Notes:

The model must be fitted.
Some other libraries call this metric r2_score.

Parameters:

Name	Type	Description	Default
`validationOrTestSet`	`union<Table, TabularDataset>`	The validation or test set.	-

Results:

Name	Type	Description
`coefficientOfDetermination`	`Float`	The coefficient of determination of the regressor.

Stub code in Regressor.sdsstub

@Pure
@PythonName("coefficient_of_determination")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun coefficientOfDetermination(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> coefficientOfDetermination: Float

`fit`¶

Create a copy of this regressor and fit it with the given training data.

This regressor is not modified.

Parameters:

Name	Type	Description	Default
`trainingSet`	`TabularDataset`	The training data containing the feature and target vectors.	-

Results:

Name	Type	Description
`fittedRegressor`	`RandomForestRegressor`	The fitted regressor.

Stub code in RandomForestRegressor.sdsstub

@Pure
@Category(DataScienceCategory.ModelingQClassicalRegression)
fun fit(
    @PythonName("training_set") trainingSet: TabularDataset
) -> fittedRegressor: RandomForestRegressor

`getFeatureNames`¶

Return the names of the feature columns.

Note: The model must be fitted.

Results:

Name	Type	Description
`featureNames`	`List<String>`	The names of the feature columns.

Stub code in SupervisedModel.sdsstub

@Pure
@PythonName("get_feature_names")
fun getFeatureNames() -> featureNames: List<String>

`getFeaturesSchema`¶

Return the schema of the feature columns.

Note: The model must be fitted.

Results:

Name	Type	Description
`featureSchema`	`Schema`	The schema of the feature columns.

Stub code in SupervisedModel.sdsstub

@Pure
@PythonName("get_features_schema")
fun getFeaturesSchema() -> featureSchema: Schema

`getTargetName`¶

Return the name of the target column.

Note: The model must be fitted.

Results:

Name	Type	Description
`targetName`	`String`	The name of the target column.

Stub code in SupervisedModel.sdsstub

@Pure
@PythonName("get_target_name")
fun getTargetName() -> targetName: String

`getTargetType`¶

Return the type of the target column.

Note: The model must be fitted.

Results:

Name	Type	Description
`targetType`	`ColumnType`	The type of the target column.

Stub code in SupervisedModel.sdsstub

@Pure
@PythonName("get_target_type")
fun getTargetType() -> targetType: ColumnType

`meanAbsoluteError`¶

Compute the mean absolute error (MAE) of the regressor on the given data.

The mean absolute error is the average of the absolute differences between the predicted and expected target values. The lower the mean absolute error, the better the regressor. Results range from 0.0 to positive infinity.

Note: The model must be fitted.

Parameters:

Name	Type	Description	Default
`validationOrTestSet`	`union<Table, TabularDataset>`	The validation or test set.	-

Results:

Name	Type	Description
`meanAbsoluteError`	`Float`	The mean absolute error of the regressor.

Stub code in Regressor.sdsstub

@Pure
@PythonName("mean_absolute_error")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun meanAbsoluteError(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> meanAbsoluteError: Float

`meanDirectionalAccuracy`¶

Compute the mean directional accuracy (MDA) of the regressor on the given data.

This metric compares two consecutive target values and checks if the predicted direction (down/unchanged/up) matches the expected direction. The mean directional accuracy is the proportion of correctly predicted directions. The higher the mean directional accuracy, the better the regressor. Results range from 0.0 to 1.0.

This metric is useful for time series data, where the order of the target values has a meaning. It is not useful for other types of data. Because of this, it is not included in the summarize_metrics method.

Note: The model must be fitted.

Parameters:

Name	Type	Description	Default
`validationOrTestSet`	`union<Table, TabularDataset>`	The validation or test set.	-

Results:

Name	Type	Description
`meanDirectionalAccuracy`	`Float`	The mean directional accuracy of the regressor.

Stub code in Regressor.sdsstub

@Pure
@PythonName("mean_directional_accuracy")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun meanDirectionalAccuracy(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> meanDirectionalAccuracy: Float

`meanSquaredError`¶

Compute the mean squared error (MSE) of the regressor on the given data.

The mean squared error is the average of the squared differences between the predicted and expected target values. The lower the mean squared error, the better the regressor. Results range from 0.0 to positive infinity.

Notes:

The model must be fitted.
To get the root mean squared error (RMSE), take the square root of the result.

Parameters:

Name	Type	Description	Default
`validationOrTestSet`	`union<Table, TabularDataset>`	The validation or test set.	-

Results:

Name	Type	Description
`meanSquaredError`	`Float`	The mean squared error of the regressor.

Stub code in Regressor.sdsstub

@Pure
@PythonName("mean_squared_error")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun meanSquaredError(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> meanSquaredError: Float

`medianAbsoluteDeviation`¶

Compute the median absolute deviation (MAD) of the regressor on the given data.

The median absolute deviation is the median of the absolute differences between the predicted and expected target values. The lower the median absolute deviation, the better the regressor. Results range from 0.0 to positive infinity.

Note: The model must be fitted.

Parameters:

Name	Type	Description	Default
`validationOrTestSet`	`union<Table, TabularDataset>`	The validation or test set.	-

Results:

Name	Type	Description
`medianAbsoluteDeviation`	`Float`	The median absolute deviation of the regressor.

Stub code in Regressor.sdsstub

@Pure
@PythonName("median_absolute_deviation")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun medianAbsoluteDeviation(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> medianAbsoluteDeviation: Float

`predict`¶

Predict the target values on the given dataset.

Note: The model must be fitted.

Parameters:

Name	Type	Description	Default
`dataset`	`union<Table, TabularDataset>`	The dataset containing at least the features.	-

Results:

Name	Type	Description
`prediction`	`TabularDataset`	The given dataset with an additional column for the predicted target values.

Stub code in SupervisedModel.sdsstub

@Pure
fun predict(
    dataset: union<Table, TabularDataset>
) -> prediction: TabularDataset

`summarizeMetrics`¶

Summarize the regressor's metrics on the given data.

Note: The model must be fitted.

API Stability

Do not rely on the exact output of this method. In future versions, we may change the displayed metrics without prior notice.

Parameters:

Name	Type	Description	Default
`validationOrTestSet`	`union<Table, TabularDataset>`	The validation or test set.	-

Results:

Name	Type	Description
`metrics`	`Table`	A table containing the regressor's metrics.

Stub code in Regressor.sdsstub

@Pure
@PythonName("summarize_metrics")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun summarizeMetrics(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> metrics: Table

RandomForestRegressor¶

isFitted¶

maxDepth¶

minSampleCountInLeaves¶

treeCount¶

coefficientOfDetermination¶

fit¶

getFeatureNames¶

getFeaturesSchema¶

getTargetName¶

getTargetType¶

meanAbsoluteError¶

meanDirectionalAccuracy¶

meanSquaredError¶

medianAbsoluteDeviation¶

predict¶

summarizeMetrics¶

`RandomForestRegressor`¶

`isFitted`¶

`maxDepth`¶

`minSampleCountInLeaves`¶

`treeCount`¶

`coefficientOfDetermination`¶

`fit`¶

`getFeatureNames`¶

`getFeaturesSchema`¶

`getTargetName`¶

`getTargetType`¶

`meanAbsoluteError`¶

`meanDirectionalAccuracy`¶

`meanSquaredError`¶

`medianAbsoluteDeviation`¶

`predict`¶

`summarizeMetrics`¶