Utility Service#

class UtilityService(name: str, group_index_train: Dict[str, ndarray] | None, group_index_test: Dict[str, ndarray] | None)[source]#

Bases: BaseService

Utility service providing helper functions for evaluators and cross-validation.

This service provides comprehensive utility functionality for the Brisk package, including cross-validation splitters, algorithm wrapper management, group index handling, and plot settings management. It serves as a centralized utility service for common operations needed throughout the evaluation pipeline.

The service manages algorithm configurations, handles grouped data indices, creates appropriate cross-validation splitters based on data characteristics, and provides access to plot settings and configurations.

Attributes:

algorithm_configOptional[algorithm_collection.AlgorithmCollection]: The algorithm configuration containing algorithm wrappers
group_index_trainOptional[Dict[str, np.ndarray]]: The group index for the training data
group_index_testOptional[Dict[str, np.ndarray]]: The group index for the test data
data_has_groupsbool: Boolean flag indicating if the data has group information
plot_settingsPlotSettings: The plot settings configuration

Notes

The service automatically detects if data has groups based on the presence of group indices. It provides appropriate cross-validation splitters based on data characteristics (categorical vs continuous, grouped vs ungrouped).

Examples

>>> from brisk.services.utility import UtilityService
>>> from brisk.configuration import AlgorithmCollection
>>> 
>>> # Create utility service
>>> utility_service = UtilityService("utility", group_index_train, group_index_test)
>>> utility_service.set_algorithm_config(algorithm_config)
>>> 
>>> # Get algorithm wrapper
>>> wrapper = utility_service.get_algo_wrapper("RandomForest")
>>> 
>>> # Get cross-validation splitter
>>> splitter, indices = utility_service.get_cv_splitter(y, cv=5)

get_algo_wrapper(wrapper_name: str) → AlgorithmWrapper[source]#

Get the AlgorithmWrapper instance.

This method retrieves an algorithm wrapper from the algorithm configuration by its name.

Parameters:

wrapper_namestr: The name of the AlgorithmWrapper to retrieve

Returns:

algorithm_wrapper.AlgorithmWrapper: The AlgorithmWrapper instance

Raises:

KeyError: If the wrapper name is not found in the algorithm configuration
AttributeError: If the algorithm configuration is not set

Examples

>>> utility_service = UtilityService("utility", None, None)
>>> utility_service.set_algorithm_config(algorithm_config)
>>> wrapper = utility_service.get_algo_wrapper("RandomForest")
>>> print(wrapper.display_name)

get_cv_splitter(y: Series, cv: int = 5, num_repeats: int | None = None) → Tuple[BaseCrossValidator, ndarray | None][source]#

Get the cross-validator splitter for the data.

This method creates an appropriate cross-validation splitter based on the data characteristics. It considers whether the data is categorical or continuous, whether it has groups, and whether repeated splitting is requested.

Parameters:

ypd.Series: The target variable used to determine data characteristics
cvint, default=5: The number of folds or splits to create
num_repeatsOptional[int], default=None: The number of repeats for repeated cross-validation

Returns:

Tuple[model_select.BaseCrossValidator, Optional[np.ndarray]]: A tuple containing: - The cross-validator splitter appropriate for the data - The group index array (None if no groups)

Notes

The method automatically selects the appropriate splitter: - For grouped data: StratifiedGroupKFold or GroupKFold - For ungrouped data: StratifiedKFold or KFold - With repeats: RepeatedStratifiedKFold or RepeatedKFold - Categorical detection: Based on unique value ratio (< 5%)

Examples

>>> utility_service = UtilityService("utility", group_index_train, group_index_test)
>>> splitter, indices = utility_service.get_cv_splitter(y, cv=5)
>>> for train_idx, val_idx in splitter.split(X, y, groups=indices):
...     # Use train_idx and val_idx for cross-validation

get_group_index(is_test: bool) → Dict[str, ndarray] | None[source]#

Get the group index for the training or test data.

This method returns the appropriate group index based on whether the data is from the training or test set. If the data doesn’t have groups, it returns None.

Parameters:

is_testbool: Whether the data is test data

Returns:

Optional[Dict[str, np.ndarray]]: The group index for the training or test data, or None if the data doesn’t have groups

Examples

>>> utility_service = UtilityService("utility", group_index_train, group_index_test)
>>> train_groups = utility_service.get_group_index(is_test=False)
>>> test_groups = utility_service.get_group_index(is_test=True)

get_plot_settings() → PlotSettings[source]#

Get the current plot settings configuration.

This method returns the current plot settings configuration that is being used for plot generation.

Returns:

PlotSettings: The current plot settings configuration

Examples

>>> utility_service = UtilityService("utility", None, None)
>>> plot_settings = utility_service.get_plot_settings()
>>> print(plot_settings.primary_color)

set_algorithm_config(algorithm_config: AlgorithmCollection) → None[source]#

Set the algorithm configuration.

This method sets the algorithm configuration that contains all algorithm wrappers and their configurations.

Parameters:

algorithm_configalgorithm_collection.AlgorithmCollection: The algorithm configuration containing algorithm wrappers

Notes

The algorithm configuration is required for accessing algorithm wrappers through the get_algo_wrapper() method.

set_plot_settings(plot_settings: PlotSettings) → None[source]#

Set the plot settings configuration.

This method sets the plot settings that will be used for generating plots throughout the evaluation process.

Parameters:

plot_settingsPlotSettings: The plot settings configuration

Notes

The plot settings control various aspects of plot generation including theme, colors, dimensions, and file output formats.

set_split_indices(group_index_train: Dict[str, ndarray] | None, group_index_test: Dict[str, ndarray] | None) → None[source]#

Set the split indices for grouped data.

This method sets the group indices for training and test data and automatically determines if the data has groups based on the presence of both indices.

Parameters:

group_index_trainOptional[Dict[str, np.ndarray]]: The group index for the training data
group_index_testOptional[Dict[str, np.ndarray]]: The group index for the test data

Notes

The data_has_groups flag is set to True only if both group indices are provided (not None). This flag is used by other methods to determine the appropriate cross-validation strategy.

Utility Service#

This Page