Skip to content

OneHotEncoder

A way to deal with categorical features that is particularly useful for unordered (i.e. nominal) data.

It replaces a column with a set of columns, each representing a unique value in the original column. The value of each new column is 1 if the original column had that value, and 0 otherwise. Take the following table as an example:

col1
"a"
"b"
"c"
"a"

The one-hot encoding of this table is:

col1__a col1__b col1__c
1 0 0
0 1 0
0 0 1
1 0 0

The name "one-hot" comes from the fact that each row has exactly one 1 in it, and the rest of the values are 0s. One-hot encoding is closely related to dummy variable / indicator variables, which are used in statistics.

Parent type: InvertibleTableTransformer

Parameters:

Name Type Description Default
selector union<List<String>, String?> The list of columns used to fit the transformer. If None, all non-numeric columns are used. null
separator String The separator used to separate the original column name from the value in the new column names. "__"

Examples:

pipeline example {
   val table = Table({"a": ["z", "y"], "b": [3, 4]});
   val encoder = OneHotEncoder(selector=["a"]).fit(table);
   val transformedTable = encoder.transform(table);
   // Table({"a__z": [1, 0], "a__y": [0, 1], "b": [3, 4]})
   val originalTable = encoder.inverseTransform(transformedTable);
   // Table({"a": ["z", "y"], "b": [3, 4]})
}
Stub code in OneHotEncoder.sdsstub

class OneHotEncoder(
    selector: union<List<String>, String, Nothing?> = null,
    separator: String = "__"
) sub InvertibleTableTransformer {
    /**
     * The separator used to separate the original column name from the value in the new column names.
     */
    attr separator: String

    /**
     * Learn a transformation for a set of columns in a table.
     *
     * This transformer is not modified.
     *
     * @param table The table used to fit the transformer.
     *
     * @result fittedTransformer The fitted transformer.
     */
    @Pure
    @Category(DataScienceCategory.DataProcessingQTransformer)
    fun fit(
        table: Table
    ) -> fittedTransformer: OneHotEncoder

    /**
     * Learn a transformation for a set of columns in a table and apply the learned transformation to the same table.
     *
     * **Note:** Neither this transformer nor the given table are modified.
     *
     * @param table The table used to fit the transformer. The transformer is then applied to this table.
     *
     * @result fittedTransformer The fitted transformer.
     * @result transformedTable The transformed table.
     */
    @Pure
    @PythonName("fit_and_transform")
    @Category(DataScienceCategory.DataProcessingQTransformer)
    fun fitAndTransform(
        table: Table
    ) -> (fittedTransformer: OneHotEncoder, transformedTable: Table)
}

isFitted

Whether the transformer is fitted.

Type: Boolean

separator

The separator used to separate the original column name from the value in the new column names.

Type: String

fit

Learn a transformation for a set of columns in a table.

This transformer is not modified.

Parameters:

Name Type Description Default
table Table The table used to fit the transformer. -

Results:

Name Type Description
fittedTransformer OneHotEncoder The fitted transformer.
Stub code in OneHotEncoder.sdsstub

@Pure
@Category(DataScienceCategory.DataProcessingQTransformer)
fun fit(
    table: Table
) -> fittedTransformer: OneHotEncoder

fitAndTransform

Learn a transformation for a set of columns in a table and apply the learned transformation to the same table.

Note: Neither this transformer nor the given table are modified.

Parameters:

Name Type Description Default
table Table The table used to fit the transformer. The transformer is then applied to this table. -

Results:

Name Type Description
fittedTransformer OneHotEncoder The fitted transformer.
transformedTable Table The transformed table.
Stub code in OneHotEncoder.sdsstub

@Pure
@PythonName("fit_and_transform")
@Category(DataScienceCategory.DataProcessingQTransformer)
fun fitAndTransform(
    table: Table
) -> (fittedTransformer: OneHotEncoder, transformedTable: Table)

inverseTransform

Undo the learned transformation as well as possible.

Column order and types may differ from the original table. Likewise, some values might not be restored.

Note: The given table is not modified.

Parameters:

Name Type Description Default
transformedTable Table The table to be transformed back to the original version. -

Results:

Name Type Description
originalTable Table The original table.
Stub code in InvertibleTableTransformer.sdsstub

@Pure
@PythonName("inverse_transform")
fun inverseTransform(
    @PythonName("transformed_table") transformedTable: Table
) -> originalTable: Table

transform

Apply the learned transformation to a table.

Note: The given table is not modified.

Parameters:

Name Type Description Default
table Table The table to which the learned transformation is applied. -

Results:

Name Type Description
transformedTable Table The transformed table.
Stub code in TableTransformer.sdsstub

@Pure
fun transform(
    table: Table
) -> transformedTable: Table