Configuration Manager#
- class ConfigurationManager(experiment_groups: List[ExperimentGroup], categorical_features: Dict[str, List[str]])[source]#
Manage experiment configurations and DataManager instances.
This class processes ExperimentGroup configurations and creates the minimum necessary DataManager instances, reusing them when configurations match.
- Parameters:
- experiment_groupsList[ExperimentGroup]
List of experiment group configurations to process
- categorical_featuresDict[str, List[str]]
Dictionary mapping dataset identifiers to lists of categorical feature names
- plot_settingsPlotSettings
Plot configuration settings for the experiments
- Attributes:
- experiment_groupsList[ExperimentGroup]
List of experiment group configurations
- data_managersDict[str, DataManager]
Mapping of group names to DataManager instances
- categorical_featuresDict[str, List[str]]
Mapping of dataset identifiers to categorical feature lists
- project_rootPath
Root directory of the project
- algorithm_configAlgorithmCollection
Collection of algorithm configurations loaded from algorithms.py
- base_data_managerDataManager
Base configuration for data management loaded from data.py
- experiment_queuecollections.deque
Queue of experiments ready to run
- output_structureDict[str, Dict[str, Tuple[str, str]]]
Directory structure for experiment outputs
- description_mapDict[str, str]
Mapping of group names to descriptions
- workflow_mapDict[str, Type]
Mapping of workflow names to workflow classes
- logfilestr
Markdown documentation of the configuration
Notes
The ConfigurationManager optimizes memory usage by reusing DataManager instances when experiment groups have identical data configurations. This is particularly important when running many experiments with similar data processing requirements.
Examples
- Create a configuration manager:
>>> from brisk.configuration import ConfigurationManager >>> manager = ConfigurationManager( ... experiment_groups=groups, ... categorical_features=categorical_features, ... plot_settings=plot_settings ... )
- create_data_managers() Dict[str, DataManager][source]#
Create minimal set of DataManager instances.
Groups ExperimentGroups by their data_config and creates one DataManager instance per unique configuration. This optimization reduces memory usage by reusing data managers when configurations are identical.
- Returns:
- Dict[str, DataManager]
Dictionary mapping group names to DataManager instances
Notes
The method groups experiment groups by their data configuration: - Groups with identical data_config share the same DataManager - Groups with no data_config use the base DataManager - Preprocessor configurations are handled specially to ensure
proper grouping based on preprocessor types
This optimization is particularly important when running many experiments with similar data processing requirements.
- create_data_splits() None[source]#
Create DataSplitInfo instances for all datasets.
Creates data splits for each dataset in each experiment group using the appropriate DataManager instance. This prepares all datasets for cross-validation and train/test splitting.
Notes
The method processes each experiment group and: 1. Gets the appropriate DataManager for the group 2. For each dataset in the group:
Determines categorical features for the dataset
Creates data splits using the DataManager
Associates splits with the group and dataset
This ensures that all datasets are properly prepared for experiment execution with the correct feature categorization.
- create_description_map() Dict[str, str][source]#
Create a mapping of group names to descriptions.
Creates a simple mapping of experiment group names to their descriptions, filtering out empty descriptions.
- Returns:
- Dict[str, str]
Mapping of group names to their descriptions, excluding empty descriptions
Notes
This mapping is used for generating reports and documentation where group descriptions are needed. Empty descriptions are filtered out to avoid cluttering the output with meaningless entries.
- create_experiment_queue() deque[source]#
Create queue of experiments from all ExperimentGroups.
Creates an ExperimentFactory with loaded algorithm configuration, then processes each ExperimentGroup to create Experiment instances. Loads workflow classes and creates the complete experiment queue.
- Returns:
- collections.deque
Queue of Experiment instances ready to run
Notes
The method: 1. Creates an ExperimentFactory with algorithm configuration 2. Loads workflow classes for each experiment group 3. Determines the number of data splits for each group 4. Creates experiment instances for all algorithm-dataset combinations 5. Adds all experiments to the execution queue
The experiment queue is processed during experiment execution, with each experiment running independently.
- create_logfile() None[source]#
Create a markdown string describing the configuration.
Generates comprehensive documentation of the experiment configuration including algorithm settings, experiment group details, data manager configurations, and dataset information.
Notes
The generated markdown includes: - Default algorithm configurations with parameters and grids - Experiment group descriptions and settings - DataManager configurations for each group - Dataset information including feature categorization - Algorithm-specific configurations for each group
This documentation is saved as part of the experiment results and provides a complete record of the configuration used.
- get_output_structure() Dict[str, Dict[str, Tuple[str, str]]][source]#
Get the directory structure for experiment outputs.
Creates a nested dictionary structure that maps experiment groups to their datasets and provides the necessary path information for organizing experiment results.
- Returns:
- Dict[str, Dict[str, Tuple[str, str]]]
Nested dictionary structure where: - Top level keys are experiment group names - Second level maps dataset names to (path, table_name) tuples
Notes
The output structure is used to organize experiment results in a hierarchical manner: - Group level: Each experiment group gets its own directory - Dataset level: Each dataset within a group gets its own subdirectory - File level: Results are organized by dataset and table name
This structure ensures that results from different experiments are properly separated and organized.
- load_algorithm_config() AlgorithmCollection[source]#
Load algorithm configuration from project’s algorithms.py.
Loads the complete algorithm configuration that defines all available algorithms, their default parameters, and hyperparameter grids for the experiments.
- Returns:
- AlgorithmCollection
Collection of AlgorithmWrapper instances from algorithms.py
- Raises:
- FileNotFoundError
If algorithms.py is not found in project root
- ImportError
If algorithms.py cannot be loaded or ALGORITHM_CONFIG is not defined
Notes
The algorithms.py file must define: ALGORITHM_CONFIG = AlgorithmCollection(…)
This configuration is used by the ExperimentFactory to create experiment instances with the appropriate algorithms.
- load_base_data_manager() DataManager[source]#
Load default DataManager configuration from project’s data.py.
Loads the base DataManager configuration that serves as the template for all experiment groups. This configuration defines default parameters for data processing, splitting, and preprocessing.
- Returns:
- DataManager
Configured DataManager instance loaded from data.py
- Raises:
- FileNotFoundError
If data.py is not found in project root
- ImportError
If data.py cannot be loaded or BASE_DATA_MANAGER is not defined
Notes
The data.py file must define: BASE_DATA_MANAGER = DataManager(…)
This base configuration is used as a template for creating group-specific data managers with custom parameters.
- set_services(plot_settings: PlotSettings, services: ServiceBundle | None = None)[source]#