Configuration Manager#

class ConfigurationManager(experiment_groups: List[ExperimentGroup], categorical_features: Dict[str, List[str]])[source]#

Manage experiment configurations and DataManager instances.

This class processes ExperimentGroup configurations and creates the minimum necessary DataManager instances, reusing them when configurations match.

Parameters:

experiment_groupsList[ExperimentGroup]: List of experiment group configurations to process
categorical_featuresDict[str, List[str]]: Dictionary mapping dataset identifiers to lists of categorical feature names
plot_settingsPlotSettings: Plot configuration settings for the experiments

Attributes:

experiment_groupsList[ExperimentGroup]: List of experiment group configurations
data_managersDict[str, DataManager]: Mapping of group names to DataManager instances
categorical_featuresDict[str, List[str]]: Mapping of dataset identifiers to categorical feature lists
project_rootPath: Root directory of the project
algorithm_configAlgorithmCollection: Collection of algorithm configurations loaded from algorithms.py
base_data_managerDataManager: Base configuration for data management loaded from data.py
experiment_queuecollections.deque: Queue of experiments ready to run
output_structureDict[str, Dict[str, Tuple[str, str]]]: Directory structure for experiment outputs
description_mapDict[str, str]: Mapping of group names to descriptions
workflow_mapDict[str, Type]: Mapping of workflow names to workflow classes
logfilestr: Markdown documentation of the configuration

Notes

The ConfigurationManager optimizes memory usage by reusing DataManager instances when experiment groups have identical data configurations. This is particularly important when running many experiments with similar data processing requirements.

Examples

Create a configuration manager:

>>> from brisk.configuration import ConfigurationManager
>>> manager = ConfigurationManager(
...     experiment_groups=groups,
...     categorical_features=categorical_features,
...     plot_settings=plot_settings
... )

create_data_managers() → Dict[str, DataManager][source]#

Create minimal set of DataManager instances.

Groups ExperimentGroups by their data_config and creates one DataManager instance per unique configuration. This optimization reduces memory usage by reusing data managers when configurations are identical.

Returns:

Dict[str, DataManager]: Dictionary mapping group names to DataManager instances

Notes

The method groups experiment groups by their data configuration: - Groups with identical data_config share the same DataManager - Groups with no data_config use the base DataManager - Preprocessor configurations are handled specially to ensure

proper grouping based on preprocessor types

This optimization is particularly important when running many experiments with similar data processing requirements.

create_data_splits() → None[source]#

Create DataSplitInfo instances for all datasets.

Creates data splits for each dataset in each experiment group using the appropriate DataManager instance. This prepares all datasets for cross-validation and train/test splitting.

Notes

The method processes each experiment group and: 1. Gets the appropriate DataManager for the group 2. For each dataset in the group:

Determines categorical features for the dataset

Creates data splits using the DataManager

Associates splits with the group and dataset

This ensures that all datasets are properly prepared for experiment execution with the correct feature categorization.

create_description_map() → Dict[str, str][source]#

Create a mapping of group names to descriptions.

Creates a simple mapping of experiment group names to their descriptions, filtering out empty descriptions.

Returns:

Dict[str, str]: Mapping of group names to their descriptions, excluding empty descriptions

Notes

This mapping is used for generating reports and documentation where group descriptions are needed. Empty descriptions are filtered out to avoid cluttering the output with meaningless entries.

create_experiment_queue() → deque[source]#

Create queue of experiments from all ExperimentGroups.

Creates an ExperimentFactory with loaded algorithm configuration, then processes each ExperimentGroup to create Experiment instances. Loads workflow classes and creates the complete experiment queue.

Returns:

collections.deque: Queue of Experiment instances ready to run

Notes

The method: 1. Creates an ExperimentFactory with algorithm configuration 2. Loads workflow classes for each experiment group 3. Determines the number of data splits for each group 4. Creates experiment instances for all algorithm-dataset combinations 5. Adds all experiments to the execution queue

The experiment queue is processed during experiment execution, with each experiment running independently.

create_logfile() → None[source]#

Create a markdown string describing the configuration.

Generates comprehensive documentation of the experiment configuration including algorithm settings, experiment group details, data manager configurations, and dataset information.

Notes

The generated markdown includes: - Default algorithm configurations with parameters and grids - Experiment group descriptions and settings - DataManager configurations for each group - Dataset information including feature categorization - Algorithm-specific configurations for each group

This documentation is saved as part of the experiment results and provides a complete record of the configuration used.

get_output_structure() → Dict[str, Dict[str, Tuple[str, str]]][source]#

Get the directory structure for experiment outputs.

Creates a nested dictionary structure that maps experiment groups to their datasets and provides the necessary path information for organizing experiment results.

Returns:

Dict[str, Dict[str, Tuple[str, str]]]: Nested dictionary structure where: - Top level keys are experiment group names - Second level maps dataset names to (path, table_name) tuples

Notes

The output structure is used to organize experiment results in a hierarchical manner: - Group level: Each experiment group gets its own directory - Dataset level: Each dataset within a group gets its own subdirectory - File level: Results are organized by dataset and table name

This structure ensures that results from different experiments are properly separated and organized.

load_algorithm_config() → AlgorithmCollection[source]#

Load algorithm configuration from project’s algorithms.py.

Loads the complete algorithm configuration that defines all available algorithms, their default parameters, and hyperparameter grids for the experiments.

Returns:

AlgorithmCollection: Collection of AlgorithmWrapper instances from algorithms.py

Raises:

FileNotFoundError: If algorithms.py is not found in project root
ImportError: If algorithms.py cannot be loaded or ALGORITHM_CONFIG is not defined

Notes

The algorithms.py file must define: ALGORITHM_CONFIG = AlgorithmCollection(…)

This configuration is used by the ExperimentFactory to create experiment instances with the appropriate algorithms.

load_base_data_manager() → DataManager[source]#

Load default DataManager configuration from project’s data.py.

Loads the base DataManager configuration that serves as the template for all experiment groups. This configuration defines default parameters for data processing, splitting, and preprocessing.

Returns:

DataManager: Configured DataManager instance loaded from data.py

Raises:

FileNotFoundError: If data.py is not found in project root
ImportError: If data.py cannot be loaded or BASE_DATA_MANAGER is not defined

Notes

The data.py file must define: BASE_DATA_MANAGER = DataManager(…)

This base configuration is used as a template for creating group-specific data managers with custom parameters.

set_services(plot_settings: PlotSettings, services: ServiceBundle | None = None)[source]#

Configuration Manager#

This Page