Skip to content

RandomForestRegressor

Random forest regression.

Parent type: Regressor

Parameters:

Name Type Description Default
treeCount Int The number of trees to be used in the random forest. Has to be greater than 0. 100
maxDepth Int? The maximum depth of each tree. If null, the depth is not limited. Has to be greater than 0. null
minSampleCountInLeaves Int The minimum number of samples that must remain in the leaves of each tree. Has to be greater than 0. 1

Examples:

pipeline example {
    val training = Table.fromCsvFile("training.csv").toTabularDataset("target");
    val test = Table.fromCsvFile("test.csv").toTabularDataset("target");
    val regressor = RandomForestRegressor(treeCount = 10).fit(training);
    val meanSquaredError = regressor.meanSquaredError(test);
}
Stub code in RandomForestRegressor.sdsstub

class RandomForestRegressor(
    @PythonName("tree_count") const treeCount: Int = 100,
    @PythonName("max_depth") maxDepth: Int? = null,
    @PythonName("min_sample_count_in_leaves") const minSampleCountInLeaves: Int = 1,
) sub Regressor where {
    treeCount > 0,
    minSampleCountInLeaves > 0,
} {
    /**
     * The number of trees used in the random forest.
     */
    @PythonName("tree_count") attr treeCount: Int
    /**
     * The maximum depth of each tree.
     */
    @PythonName("max_depth") attr maxDepth: Int?
    /**
     * The minimum number of samples that must remain in the leaves of each tree.
     */
    @PythonName("min_sample_count_in_leaves") attr minSampleCountInLeaves: Int

    /**
     * Create a copy of this regressor and fit it with the given training data.
     *
     * This regressor is not modified.
     *
     * @param trainingSet The training data containing the feature and target vectors.
     *
     * @result fittedRegressor The fitted regressor.
     */
    @Pure
    @Category(DataScienceCategory.ModelingQClassicalRegression)
    fun fit(
        @PythonName("training_set") trainingSet: TabularDataset
    ) -> fittedRegressor: RandomForestRegressor
}

isFitted

Whether the model is fitted.

Type: Boolean

maxDepth

The maximum depth of each tree.

Type: Int?

minSampleCountInLeaves

The minimum number of samples that must remain in the leaves of each tree.

Type: Int

treeCount

The number of trees used in the random forest.

Type: Int

coefficientOfDetermination

Compute the coefficient of determination (R²) of the regressor on the given data.

The coefficient of determination compares the regressor's predictions to another model that always predicts the mean of the target values. It is a measure of how well the regressor explains the variance in the target values.

The higher the coefficient of determination, the better the regressor. Results range from negative infinity to 1.0. You can interpret the coefficient of determination as follows:

Interpretation
1.0 The model perfectly predicts the target values. Did you overfit?
(0.0, 1.0) The model is better than predicting the mean of the target values. You should be here.
0.0 The model is as good as predicting the mean of the target values. Try something else.
(-∞, 0.0) The model is worse than predicting the mean of the target values. Something is very wrong.

Notes:

  • The model must be fitted.
  • Some other libraries call this metric r2_score.

Parameters:

Name Type Description Default
validationOrTestSet union<Table, TabularDataset> The validation or test set. -

Results:

Name Type Description
coefficientOfDetermination Float The coefficient of determination of the regressor.
Stub code in Regressor.sdsstub

@Pure
@PythonName("coefficient_of_determination")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun coefficientOfDetermination(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> coefficientOfDetermination: Float

fit

Create a copy of this regressor and fit it with the given training data.

This regressor is not modified.

Parameters:

Name Type Description Default
trainingSet TabularDataset The training data containing the feature and target vectors. -

Results:

Name Type Description
fittedRegressor RandomForestRegressor The fitted regressor.
Stub code in RandomForestRegressor.sdsstub

@Pure
@Category(DataScienceCategory.ModelingQClassicalRegression)
fun fit(
    @PythonName("training_set") trainingSet: TabularDataset
) -> fittedRegressor: RandomForestRegressor

getFeatureNames

Return the names of the feature columns.

Note: The model must be fitted.

Results:

Name Type Description
featureNames List<String> The names of the feature columns.
Stub code in SupervisedModel.sdsstub

@Pure
@PythonName("get_feature_names")
fun getFeatureNames() -> featureNames: List<String>

getFeaturesSchema

Return the schema of the feature columns.

Note: The model must be fitted.

Results:

Name Type Description
featureSchema Schema The schema of the feature columns.
Stub code in SupervisedModel.sdsstub

@Pure
@PythonName("get_features_schema")
fun getFeaturesSchema() -> featureSchema: Schema

getTargetName

Return the name of the target column.

Note: The model must be fitted.

Results:

Name Type Description
targetName String The name of the target column.
Stub code in SupervisedModel.sdsstub

@Pure
@PythonName("get_target_name")
fun getTargetName() -> targetName: String

getTargetType

Return the type of the target column.

Note: The model must be fitted.

Results:

Name Type Description
targetType ColumnType The type of the target column.
Stub code in SupervisedModel.sdsstub

@Pure
@PythonName("get_target_type")
fun getTargetType() -> targetType: ColumnType

meanAbsoluteError

Compute the mean absolute error (MAE) of the regressor on the given data.

The mean absolute error is the average of the absolute differences between the predicted and expected target values. The lower the mean absolute error, the better the regressor. Results range from 0.0 to positive infinity.

Note: The model must be fitted.

Parameters:

Name Type Description Default
validationOrTestSet union<Table, TabularDataset> The validation or test set. -

Results:

Name Type Description
meanAbsoluteError Float The mean absolute error of the regressor.
Stub code in Regressor.sdsstub

@Pure
@PythonName("mean_absolute_error")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun meanAbsoluteError(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> meanAbsoluteError: Float

meanDirectionalAccuracy

Compute the mean directional accuracy (MDA) of the regressor on the given data.

This metric compares two consecutive target values and checks if the predicted direction (down/unchanged/up) matches the expected direction. The mean directional accuracy is the proportion of correctly predicted directions. The higher the mean directional accuracy, the better the regressor. Results range from 0.0 to 1.0.

This metric is useful for time series data, where the order of the target values has a meaning. It is not useful for other types of data. Because of this, it is not included in the summarize_metrics method.

Note: The model must be fitted.

Parameters:

Name Type Description Default
validationOrTestSet union<Table, TabularDataset> The validation or test set. -

Results:

Name Type Description
meanDirectionalAccuracy Float The mean directional accuracy of the regressor.
Stub code in Regressor.sdsstub

@Pure
@PythonName("mean_directional_accuracy")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun meanDirectionalAccuracy(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> meanDirectionalAccuracy: Float

meanSquaredError

Compute the mean squared error (MSE) of the regressor on the given data.

The mean squared error is the average of the squared differences between the predicted and expected target values. The lower the mean squared error, the better the regressor. Results range from 0.0 to positive infinity.

Notes:

  • The model must be fitted.
  • To get the root mean squared error (RMSE), take the square root of the result.

Parameters:

Name Type Description Default
validationOrTestSet union<Table, TabularDataset> The validation or test set. -

Results:

Name Type Description
meanSquaredError Float The mean squared error of the regressor.
Stub code in Regressor.sdsstub

@Pure
@PythonName("mean_squared_error")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun meanSquaredError(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> meanSquaredError: Float

medianAbsoluteDeviation

Compute the median absolute deviation (MAD) of the regressor on the given data.

The median absolute deviation is the median of the absolute differences between the predicted and expected target values. The lower the median absolute deviation, the better the regressor. Results range from 0.0 to positive infinity.

Note: The model must be fitted.

Parameters:

Name Type Description Default
validationOrTestSet union<Table, TabularDataset> The validation or test set. -

Results:

Name Type Description
medianAbsoluteDeviation Float The median absolute deviation of the regressor.
Stub code in Regressor.sdsstub

@Pure
@PythonName("median_absolute_deviation")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun medianAbsoluteDeviation(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> medianAbsoluteDeviation: Float

predict

Predict the target values on the given dataset.

Note: The model must be fitted.

Parameters:

Name Type Description Default
dataset union<Table, TabularDataset> The dataset containing at least the features. -

Results:

Name Type Description
prediction TabularDataset The given dataset with an additional column for the predicted target values.
Stub code in SupervisedModel.sdsstub

@Pure
fun predict(
    dataset: union<Table, TabularDataset>
) -> prediction: TabularDataset

summarizeMetrics

Summarize the regressor's metrics on the given data.

Note: The model must be fitted.

API Stability

Do not rely on the exact output of this method. In future versions, we may change the displayed metrics without prior notice.

Parameters:

Name Type Description Default
validationOrTestSet union<Table, TabularDataset> The validation or test set. -

Results:

Name Type Description
metrics Table A table containing the regressor's metrics.
Stub code in Regressor.sdsstub

@Pure
@PythonName("summarize_metrics")
@Category(DataScienceCategory.ModelEvaluationQMetric)
fun summarizeMetrics(
    @PythonName("validation_or_test_set") validationOrTestSet: union<Table, TabularDataset>
) -> metrics: Table