Base Preprocessor#
- class BasePreprocessor(**kwargs)[source]#
Bases:
ABCAbstract base class for all preprocessors.
All preprocessors must implement the fit and transform methods to follow the scikit-learn estimator interface pattern. This ensures consistency across all preprocessing operations in the Brisk framework.
- Parameters:
- **kwargs
Additional parameters specific to each preprocessor implementation
- Attributes:
- is_fittedbool
Whether the preprocessor has been fitted to data
Notes
This abstract base class provides the common interface that all preprocessors must implement. It includes parameter validation, the standard fit/transform pattern, and utility methods for feature name handling and parameter export.
Examples
- Create a custom preprocessor:
>>> class CustomPreprocessor(BasePreprocessor): ... def _validate_params(self, **kwargs): ... # Validate parameters ... pass ... def fit(self, X, y=None, categorical_features=None): ... # Fit logic ... return self ... def transform(self, X, y=None): ... # Transform logic ... return X, y ... def export_params(self): ... # Export parameters ... return {}
- abstractmethod export_params() Dict[str, Any][source]#
Export parameters for serialization and rerun functionality.
- Returns:
- Dict[str, Any]
Dictionary containing all parameters in JSON-serializable format
Notes
This method should return all parameters needed to recreate the preprocessor instance, suitable for JSON serialization.
- abstractmethod fit(X: DataFrame, y: Series | None = None, categorical_features: List[str] | None = None) BasePreprocessor[source]#
Fit the preprocessor to the data.
- Parameters:
- Xpd.DataFrame
Training data
- ypd.Series, optional
Target values
- categorical_featuresList[str], optional
List of categorical feature names
- Returns:
- selfBasePreprocessor
Fitted preprocessor instance
Notes
This method should fit the preprocessor to the training data and set the is_fitted flag to True upon completion.
- fit_transform(X: DataFrame, y: Series | None = None, categorical_features: List[str] | None = None) Tuple[DataFrame, Series | None][source]#
Fit the preprocessor and transform the data.
Convenience method that combines fit and transform operations in a single call.
- Parameters:
- Xpd.DataFrame
Training data
- ypd.Series, optional
Target values
- categorical_featuresList[str], optional
List of categorical feature names
- Returns:
- Tuple[pd.DataFrame, Optional[pd.Series]]
Tuple containing (transformed_X, transformed_y)
Notes
This method is equivalent to calling fit() followed by transform() on the same data. It’s provided for convenience and follows the scikit-learn pattern.
- get_feature_names(feature_names: List[str]) List[str][source]#
Get the feature names after preprocessing.
- Parameters:
- feature_namesList[str]
Original feature names
- Returns:
- List[str]
Feature names after preprocessing
Notes
By default, this method returns the original feature names unchanged. Subclasses should override this method if preprocessing changes the number or names of features (e.g., one-hot encoding).
- abstractmethod transform(X: DataFrame, y: Series | None = None) Tuple[DataFrame, Series | None][source]#
Transform the data using the fitted preprocessor.
- Parameters:
- Xpd.DataFrame
Features to transform
- ypd.Series, optional
Target values to transform (if applicable)
- Returns:
- Tuple[pd.DataFrame, Optional[pd.Series]]
Tuple containing (transformed_X, transformed_y). The target y will be None if not provided, or transformed if the preprocessor modifies it (e.g., CategoricalEncodingPreprocessor)
- Raises:
- ValueError
If preprocessor has not been fitted
Notes
All preprocessors must return a tuple (X, y). Even if a preprocessor doesn’t transform y, it must still return the tuple with y unchanged. This allows all preprocessors to be called uniformly: X, y = preprocessor.transform(X, y)