Utility Service#
- class UtilityService(name: str, group_index_train: Dict[str, ndarray] | None, group_index_test: Dict[str, ndarray] | None)[source]#
Bases:
BaseServiceUtility service providing helper functions for evaluators and cross-validation.
This service provides comprehensive utility functionality for the Brisk package, including cross-validation splitters, algorithm wrapper management, group index handling, and plot settings management. It serves as a centralized utility service for common operations needed throughout the evaluation pipeline.
The service manages algorithm configurations, handles grouped data indices, creates appropriate cross-validation splitters based on data characteristics, and provides access to plot settings and configurations.
- Attributes:
- algorithm_configOptional[algorithm_collection.AlgorithmCollection]
The algorithm configuration containing algorithm wrappers
- group_index_trainOptional[Dict[str, np.ndarray]]
The group index for the training data
- group_index_testOptional[Dict[str, np.ndarray]]
The group index for the test data
- data_has_groupsbool
Boolean flag indicating if the data has group information
- plot_settingsPlotSettings
The plot settings configuration
Notes
The service automatically detects if data has groups based on the presence of group indices. It provides appropriate cross-validation splitters based on data characteristics (categorical vs continuous, grouped vs ungrouped).
Examples
>>> from brisk.services.utility import UtilityService >>> from brisk.configuration import AlgorithmCollection >>> >>> # Create utility service >>> utility_service = UtilityService("utility", group_index_train, group_index_test) >>> utility_service.set_algorithm_config(algorithm_config) >>> >>> # Get algorithm wrapper >>> wrapper = utility_service.get_algo_wrapper("RandomForest") >>> >>> # Get cross-validation splitter >>> splitter, indices = utility_service.get_cv_splitter(y, cv=5)
- get_algo_wrapper(wrapper_name: str) AlgorithmWrapper[source]#
Get the AlgorithmWrapper instance.
This method retrieves an algorithm wrapper from the algorithm configuration by its name.
- Parameters:
- wrapper_namestr
The name of the AlgorithmWrapper to retrieve
- Returns:
- algorithm_wrapper.AlgorithmWrapper
The AlgorithmWrapper instance
- Raises:
- KeyError
If the wrapper name is not found in the algorithm configuration
- AttributeError
If the algorithm configuration is not set
Examples
>>> utility_service = UtilityService("utility", None, None) >>> utility_service.set_algorithm_config(algorithm_config) >>> wrapper = utility_service.get_algo_wrapper("RandomForest") >>> print(wrapper.display_name)
- get_cv_splitter(y: Series, cv: int = 5, num_repeats: int | None = None) Tuple[BaseCrossValidator, ndarray | None][source]#
Get the cross-validator splitter for the data.
This method creates an appropriate cross-validation splitter based on the data characteristics. It considers whether the data is categorical or continuous, whether it has groups, and whether repeated splitting is requested.
- Parameters:
- ypd.Series
The target variable used to determine data characteristics
- cvint, default=5
The number of folds or splits to create
- num_repeatsOptional[int], default=None
The number of repeats for repeated cross-validation
- Returns:
- Tuple[model_select.BaseCrossValidator, Optional[np.ndarray]]
A tuple containing: - The cross-validator splitter appropriate for the data - The group index array (None if no groups)
Notes
The method automatically selects the appropriate splitter: - For grouped data: StratifiedGroupKFold or GroupKFold - For ungrouped data: StratifiedKFold or KFold - With repeats: RepeatedStratifiedKFold or RepeatedKFold - Categorical detection: Based on unique value ratio (< 5%)
Examples
>>> utility_service = UtilityService("utility", group_index_train, group_index_test) >>> splitter, indices = utility_service.get_cv_splitter(y, cv=5) >>> for train_idx, val_idx in splitter.split(X, y, groups=indices): ... # Use train_idx and val_idx for cross-validation
- get_group_index(is_test: bool) Dict[str, ndarray] | None[source]#
Get the group index for the training or test data.
This method returns the appropriate group index based on whether the data is from the training or test set. If the data doesn’t have groups, it returns None.
- Parameters:
- is_testbool
Whether the data is test data
- Returns:
- Optional[Dict[str, np.ndarray]]
The group index for the training or test data, or None if the data doesn’t have groups
Examples
>>> utility_service = UtilityService("utility", group_index_train, group_index_test) >>> train_groups = utility_service.get_group_index(is_test=False) >>> test_groups = utility_service.get_group_index(is_test=True)
- get_plot_settings() PlotSettings[source]#
Get the current plot settings configuration.
This method returns the current plot settings configuration that is being used for plot generation.
- Returns:
- PlotSettings
The current plot settings configuration
Examples
>>> utility_service = UtilityService("utility", None, None) >>> plot_settings = utility_service.get_plot_settings() >>> print(plot_settings.primary_color)
- set_algorithm_config(algorithm_config: AlgorithmCollection) None[source]#
Set the algorithm configuration.
This method sets the algorithm configuration that contains all algorithm wrappers and their configurations.
- Parameters:
- algorithm_configalgorithm_collection.AlgorithmCollection
The algorithm configuration containing algorithm wrappers
Notes
The algorithm configuration is required for accessing algorithm wrappers through the get_algo_wrapper() method.
- set_plot_settings(plot_settings: PlotSettings) None[source]#
Set the plot settings configuration.
This method sets the plot settings that will be used for generating plots throughout the evaluation process.
- Parameters:
- plot_settingsPlotSettings
The plot settings configuration
Notes
The plot settings control various aspects of plot generation including theme, colors, dimensions, and file output formats.
- set_split_indices(group_index_train: Dict[str, ndarray] | None, group_index_test: Dict[str, ndarray] | None) None[source]#
Set the split indices for grouped data.
This method sets the group indices for training and test data and automatically determines if the data has groups based on the presence of both indices.
- Parameters:
- group_index_trainOptional[Dict[str, np.ndarray]]
The group index for the training data
- group_index_testOptional[Dict[str, np.ndarray]]
The group index for the test data
Notes
The data_has_groups flag is set to True only if both group indices are provided (not None). This flag is used by other methods to determine the appropriate cross-validation strategy.