Categorical Encoding#
- class CategoricalEncodingPreprocessor(method: str = 'label', cutoffs: List[float] | None = None, **kwargs)[source]#
Bases:
BasePreprocessorPreprocessor for categorical feature encoding.
Supports multiple encoding strategies including ordinal, one-hot, label, cyclic, and threshold encoding. Can encode both features and target variables based on configuration.
- Parameters:
- methodstr or dict, default=”label”
Encoding method: “ordinal”, “onehot”, “label”, “cyclic”, or “threshold” Or dict mapping column names to methods: {“col1”: “ordinal”, “col2”: “onehot”} If a target feature name matches a key in the dict, it will be encoded.
- cutoffslist, optional
For threshold encoding: list of cutoff values to create bins. Example: [20, 40] creates bins: <20=0, 20-40=1, >40=2
- Attributes:
- encodersdict
Dictionary mapping feature names to their fitted encoder objects
- target_encoderobject or None
Encoder for target variable if target name matches method dict
- is_fittedbool
Whether the preprocessor has been fitted
Notes
The preprocessor supports various encoding strategies: - Ordinal: Maps categories to integers - One-hot: Creates binary columns for each category - Label: Maps categories to integers (for single column) - Cyclic: Creates sin/cos features for cyclical data - Threshold: Bins continuous values into categories
Examples
- Label encoding for all categorical features:
>>> preprocessor = CategoricalEncodingPreprocessor(method="label")
- One-hot encoding for all categorical features:
>>> preprocessor = CategoricalEncodingPreprocessor(method="onehot")
- Mixed encoding strategies:
>>> preprocessor = CategoricalEncodingPreprocessor( ... method={"category1": "onehot", "category2": "ordinal"} ... )
- Threshold encoding with custom cutoffs:
>>> preprocessor = CategoricalEncodingPreprocessor( ... method="threshold", cutoffs=[20, 40, 60] ... )
- export_params() Dict[str, Any][source]#
Export parameters for serialization.
- Returns:
- Dict[str, Any]
Dictionary containing all parameters
- fit(X: DataFrame, y: Series | None = None, categorical_features: List[str] | None = None) CategoricalEncodingPreprocessor[source]#
Fit the encoders to the data.
Learns encoding parameters from the training data for each categorical feature and optionally the target variable.
- Parameters:
- Xpd.DataFrame
Training data
- ypd.Series, optional
Target values
- categorical_featuresList[str], optional
List of categorical feature names to encode
- Returns:
- selfCategoricalEncodingPreprocessor
Fitted preprocessor
Notes
The method fits encoders for each categorical feature based on the specified encoding method. If the target variable name matches a key in the method dictionary, it will also be encoded.
- fit_transform(X: DataFrame, y: Series | None = None, categorical_features: List[str] | None = None) DataFrame[source]#
Fit the encoders and transform the data.
- Parameters:
- Xpd.DataFrame
Data to fit and transform
- ypd.Series, optional
Target values
- categorical_featuresList[str], optional
List of categorical feature names to encode
- Returns:
- pd.DataFrame
Transformed data with encoded categorical features
Notes
This method combines fit and transform operations, encoding categorical features and optionally the target variable.
- get_feature_names(feature_names: List[str]) List[str][source]#
Get the feature names after encoding.
- Parameters:
- feature_namesList[str]
Original feature names
- Returns:
- List[str]
Updated feature names after encoding
Notes
Feature names are updated based on the encoding method: - One-hot encoding: Creates new features for each category - Cyclic encoding: Creates sin and cos features - Other methods: Preserve original feature names
- transform(X: DataFrame, y: Series | None = None) Tuple[DataFrame, Series][source]#
Transform the data using the fitted encoders.
- Parameters:
- Xpd.DataFrame
Features to transform
- ypd.Series, optional
Target values to transform (if target name matches method dict)
- Returns:
- Tuple[pd.DataFrame, pd.Series or None]
Always returns tuple of (transformed features, transformed target or None)
- Raises:
- ValueError
If preprocessor has not been fitted
Notes
The method applies the learned encoding to both features and optionally the target variable. It always returns a tuple for consistency.