Feature Selection#

class FeatureSelectionPreprocessor(method: str = 'selectkbest', n_features_to_select: int = 5, feature_selection_cv: int = 3, estimator: Any | None = None, algorithm_config=None, feature_selection_estimator: str | None = None, problem_type: str = 'classification', **kwargs)[source]#

Bases: BasePreprocessor

Preprocessor for feature selection methods.

Supports various feature selection algorithms including SelectKBest, RFECV, and SequentialFeatureSelector. Can use different estimators for wrapper methods.

Parameters:
methodstr, default=”selectkbest”

Feature selection method (“selectkbest”, “rfecv”, “sequential”)

n_features_to_selectint, default=5

Number of features to select

feature_selection_cvint, default=3

Number of CV folds for RFECV and SequentialFeatureSelector

estimatorAny, optional

Direct estimator to use for RFECV and SequentialFeatureSelector

algorithm_configAlgorithmCollection, optional

User-provided collection of AlgorithmWrapper objects to use for feature selection

feature_selection_estimatorstr, optional

The name of the estimator to use for feature selection. If not specified, defaults to the first algorithm in the relevant wrapper list

problem_typestr, default=”classification”

The type of problem (“classification” or “regression”). Used to determine appropriate scoring function for SelectKBest.

Attributes:
selectorsklearn.feature_selection selector

The fitted feature selector object

scalersklearn.preprocessing scaler, optional

Fitted scaler for internal use (if provided)

is_fittedbool

Whether the preprocessor has been fitted

Notes

The preprocessor supports three main feature selection methods: 1. SelectKBest: Selects k best features based on statistical tests 2. RFECV: Recursive feature elimination with cross-validation 3. SequentialFeatureSelector: Sequential feature selection

For wrapper methods (RFECV, SequentialFeatureSelector), an estimator must be provided either directly or through algorithm_config.

Examples

SelectKBest for classification:
>>> preprocessor = FeatureSelectionPreprocessor(
...     method="selectkbest", n_features_to_select=10
... )
RFECV with custom estimator:
>>> from sklearn.ensemble import RandomForestClassifier
>>> preprocessor = FeatureSelectionPreprocessor(
...     method="rfecv", estimator=RandomForestClassifier()
... )
Sequential feature selection:
>>> preprocessor = FeatureSelectionPreprocessor(
...     method="sequential", n_features_to_select=5
... )
export_params() Dict[str, Any][source]#

Export parameters for serialization and rerun functionality.

Returns:
Dict[str, Any]

Dictionary containing all parameters in JSON-serializable format

Notes

Returns all parameters needed to recreate the preprocessor instance, suitable for JSON serialization. Note that complex objects like estimators may not be directly serializable.

fit(X: DataFrame, y: Series | None = None) FeatureSelectionPreprocessor[source]#

Fit the feature selector to the data.

Learns feature selection parameters from the training data using the specified feature selection method. For wrapper methods (RFECV, SequentialFeatureSelector), the target variable is required.

Parameters:
Xpd.DataFrame

Training data features

ypd.Series, optional

Target values (required for RFECV and SequentialFeatureSelector)

Returns:
selfFeatureSelectionPreprocessor

Fitted preprocessor

Raises:
ValueError

If y is required but not provided for wrapper methods If preprocessor has not been fitted before transform

Notes

The method creates and fits the appropriate feature selector based on the specified method. If a scaler is provided, features are scaled before feature selection. The fitted selector can then be used to transform new data.

get_feature_names(feature_names: List[str]) List[str][source]#

Get the selected feature names after feature selection.

Parameters:
feature_namesList[str]

Original feature names

Returns:
List[str]

Names of selected features

Notes

Returns the names of features that were selected during fitting. If the preprocessor is not fitted or no selector is available, returns the original feature names unchanged.

transform(X: DataFrame) DataFrame[source]#

Transform the data using the fitted selector.

Applies the learned feature selection to new data, returning only the selected features from the original unscaled data.

Parameters:
Xpd.DataFrame

Data to transform

Returns:
pd.DataFrame

Data with only selected features

Raises:
ValueError

If preprocessor has not been fitted before transform

Notes

The method returns the selected features from the original unscaled data, not the scaled version used during fitting. This ensures that the output data maintains the original scale and meaning.