Feature Selection#
- class FeatureSelectionPreprocessor(method: str = 'selectkbest', n_features_to_select: int = 5, feature_selection_cv: int = 3, estimator: Any | None = None, algorithm_config=None, feature_selection_estimator: str | None = None, problem_type: str = 'classification', **kwargs)[source]#
Bases:
BasePreprocessorPreprocessor for feature selection methods.
Supports various feature selection algorithms including SelectKBest, RFECV, and SequentialFeatureSelector. Can use different estimators for wrapper methods.
- Parameters:
- methodstr, default=”selectkbest”
Feature selection method (“selectkbest”, “rfecv”, “sequential”)
- n_features_to_selectint, default=5
Number of features to select
- feature_selection_cvint, default=3
Number of CV folds for RFECV and SequentialFeatureSelector
- estimatorAny, optional
Direct estimator to use for RFECV and SequentialFeatureSelector
- algorithm_configAlgorithmCollection, optional
User-provided collection of AlgorithmWrapper objects to use for feature selection
- feature_selection_estimatorstr, optional
The name of the estimator to use for feature selection. If not specified, defaults to the first algorithm in the relevant wrapper list
- problem_typestr, default=”classification”
The type of problem (“classification” or “regression”). Used to determine appropriate scoring function for SelectKBest.
- Attributes:
- selectorsklearn.feature_selection selector
The fitted feature selector object
- scalersklearn.preprocessing scaler, optional
Fitted scaler for internal use (if provided)
- is_fittedbool
Whether the preprocessor has been fitted
Notes
The preprocessor supports three main feature selection methods: 1. SelectKBest: Selects k best features based on statistical tests 2. RFECV: Recursive feature elimination with cross-validation 3. SequentialFeatureSelector: Sequential feature selection
For wrapper methods (RFECV, SequentialFeatureSelector), an estimator must be provided either directly or through algorithm_config.
Examples
- SelectKBest for classification:
>>> preprocessor = FeatureSelectionPreprocessor( ... method="selectkbest", n_features_to_select=10 ... )
- RFECV with custom estimator:
>>> from sklearn.ensemble import RandomForestClassifier >>> preprocessor = FeatureSelectionPreprocessor( ... method="rfecv", estimator=RandomForestClassifier() ... )
- Sequential feature selection:
>>> preprocessor = FeatureSelectionPreprocessor( ... method="sequential", n_features_to_select=5 ... )
- export_params() Dict[str, Any][source]#
Export parameters for serialization and rerun functionality.
- Returns:
- Dict[str, Any]
Dictionary containing all parameters in JSON-serializable format
Notes
Returns all parameters needed to recreate the preprocessor instance, suitable for JSON serialization. Note that complex objects like estimators may not be directly serializable.
- fit(X: DataFrame, y: Series | None = None) FeatureSelectionPreprocessor[source]#
Fit the feature selector to the data.
Learns feature selection parameters from the training data using the specified feature selection method. For wrapper methods (RFECV, SequentialFeatureSelector), the target variable is required.
- Parameters:
- Xpd.DataFrame
Training data features
- ypd.Series, optional
Target values (required for RFECV and SequentialFeatureSelector)
- Returns:
- selfFeatureSelectionPreprocessor
Fitted preprocessor
- Raises:
- ValueError
If y is required but not provided for wrapper methods If preprocessor has not been fitted before transform
Notes
The method creates and fits the appropriate feature selector based on the specified method. If a scaler is provided, features are scaled before feature selection. The fitted selector can then be used to transform new data.
- get_feature_names(feature_names: List[str]) List[str][source]#
Get the selected feature names after feature selection.
- Parameters:
- feature_namesList[str]
Original feature names
- Returns:
- List[str]
Names of selected features
Notes
Returns the names of features that were selected during fitting. If the preprocessor is not fitted or no selector is available, returns the original feature names unchanged.
- transform(X: DataFrame) DataFrame[source]#
Transform the data using the fitted selector.
Applies the learned feature selection to new data, returning only the selected features from the original unscaled data.
- Parameters:
- Xpd.DataFrame
Data to transform
- Returns:
- pd.DataFrame
Data with only selected features
- Raises:
- ValueError
If preprocessor has not been fitted before transform
Notes
The method returns the selected features from the original unscaled data, not the scaled version used during fitting. This ensures that the output data maintains the original scale and meaning.