Base Preprocessor#

class BasePreprocessor(**kwargs)[source]#

Bases: ABC

Abstract base class for all preprocessors.

All preprocessors must implement the fit and transform methods to follow the scikit-learn estimator interface pattern. This ensures consistency across all preprocessing operations in the Brisk framework.

Parameters:
**kwargs

Additional parameters specific to each preprocessor implementation

Attributes:
is_fittedbool

Whether the preprocessor has been fitted to data

Notes

This abstract base class provides the common interface that all preprocessors must implement. It includes parameter validation, the standard fit/transform pattern, and utility methods for feature name handling and parameter export.

Examples

Create a custom preprocessor:
>>> class CustomPreprocessor(BasePreprocessor):
...     def _validate_params(self, **kwargs):
...         # Validate parameters
...         pass
...     def fit(self, X, y=None, categorical_features=None):
...         # Fit logic
...         return self
...     def transform(self, X, y=None):
...         # Transform logic
...         return X, y
...     def export_params(self):
...         # Export parameters
...         return {}
abstractmethod export_params() Dict[str, Any][source]#

Export parameters for serialization and rerun functionality.

Returns:
Dict[str, Any]

Dictionary containing all parameters in JSON-serializable format

Notes

This method should return all parameters needed to recreate the preprocessor instance, suitable for JSON serialization.

abstractmethod fit(X: DataFrame, y: Series | None = None, categorical_features: List[str] | None = None) BasePreprocessor[source]#

Fit the preprocessor to the data.

Parameters:
Xpd.DataFrame

Training data

ypd.Series, optional

Target values

categorical_featuresList[str], optional

List of categorical feature names

Returns:
selfBasePreprocessor

Fitted preprocessor instance

Notes

This method should fit the preprocessor to the training data and set the is_fitted flag to True upon completion.

fit_transform(X: DataFrame, y: Series | None = None, categorical_features: List[str] | None = None) Tuple[DataFrame, Series | None][source]#

Fit the preprocessor and transform the data.

Convenience method that combines fit and transform operations in a single call.

Parameters:
Xpd.DataFrame

Training data

ypd.Series, optional

Target values

categorical_featuresList[str], optional

List of categorical feature names

Returns:
Tuple[pd.DataFrame, Optional[pd.Series]]

Tuple containing (transformed_X, transformed_y)

Notes

This method is equivalent to calling fit() followed by transform() on the same data. It’s provided for convenience and follows the scikit-learn pattern.

get_feature_names(feature_names: List[str]) List[str][source]#

Get the feature names after preprocessing.

Parameters:
feature_namesList[str]

Original feature names

Returns:
List[str]

Feature names after preprocessing

Notes

By default, this method returns the original feature names unchanged. Subclasses should override this method if preprocessing changes the number or names of features (e.g., one-hot encoding).

abstractmethod transform(X: DataFrame, y: Series | None = None) Tuple[DataFrame, Series | None][source]#

Transform the data using the fitted preprocessor.

Parameters:
Xpd.DataFrame

Features to transform

ypd.Series, optional

Target values to transform (if applicable)

Returns:
Tuple[pd.DataFrame, Optional[pd.Series]]

Tuple containing (transformed_X, transformed_y). The target y will be None if not provided, or transformed if the preprocessor modifies it (e.g., CategoricalEncodingPreprocessor)

Raises:
ValueError

If preprocessor has not been fitted

Notes

All preprocessors must return a tuple (X, y). Even if a preprocessor doesn’t transform y, it must still return the tuple with y unchanged. This allows all preprocessors to be called uniformly: X, y = preprocessor.transform(X, y)