TrainingManager#

class TrainingManager(metric_config: MetricManager, config_manager: ConfigurationManager)[source]#

Manage the training and evaluation of machine learning models.

This class coordinates the entire lifecycle of machine learning experiments, from setup through execution to reporting. It manages model training across multiple datasets and algorithms, ensuring robust error handling and comprehensive result tracking.

The TrainingManager integrates with the broader Brisk ecosystem, utilizing services for logging, reporting, evaluation, and configuration management. It provides a centralized way to orchestrate complex machine learning workflows while maintaining detailed tracking and error handling.

Parameters:
metric_configMetricManager

Configuration for evaluation metrics and scoring

config_managerConfigurationManager

Instance containing all data and configuration needed to run experiments

Attributes:
servicesServiceBundle

Bundle of all available services (logging, reporting, I/O, etc.)

results_dirstr

Directory where experiment results are stored

metric_configMetricManager

Configuration for evaluation metrics and scoring

eval_managerEvaluationManager

Manager for handling model evaluation and metrics

data_managersdict

Maps group names to their corresponding data managers

experimentscollections.deque

Queue of experiments to run

logfilestr

Path to the configuration log file

output_structuredict

Structure of output data organization

description_mapdict

Mapping of names to descriptions

experiment_groupsdict

Mapping of experiment group names to their configurations

workflow_mappingdict

Maps experiment group names to their assigned workflow classes

experiment_pathsdefaultdict

Nested structure tracking experiment output paths

experiment_resultsdefaultdict

Stores results of all experiments with status and timing

Notes

The TrainingManager uses a workflow-based approach where different experiment groups can use different workflow classes. This allows for flexibility in handling different types of machine learning tasks (classification, regression, etc.) with appropriate workflows.

Error handling is comprehensive - individual experiment failures don’t stop the overall process, and detailed logging is maintained for debugging and analysis.

Examples

>>> from brisk.training.training_manager import TrainingManager
>>> from brisk.evaluation import metric_manager
>>> from brisk.configuration import configuration
>>> 
>>> # Create metric configuration
>>> metric_config = metric_manager.MetricManager()
>>> 
>>> # Create configuration manager with experiments
>>> config_manager = configuration.ConfigurationManager()
>>> 
>>> # Initialize training manager
>>> trainer = TrainingManager(metric_config, config_manager)
>>> 
>>> # Run all experiments with report generation
>>> trainer.run_experiments(create_report=True)
run_experiments(create_report: bool = True) None[source]#

Run all experiments in the queue and generate a comprehensive report.

This method orchestrates the execution of all experiments in the queue, using the workflow_mapping to determine which workflow class to use for each experiment group. It provides comprehensive error handling and progress tracking throughout the process.

The method ensures that all experiments are attempted even if some fail, and provides detailed logging and result tracking. After all experiments complete, it can generate an HTML report summarizing the results.

Parameters:
create_reportbool, default=True

Whether to generate an HTML report after all experiments complete. If True, creates a comprehensive report with all results and visualizations.

Raises:
ValueError

If any experiment group does not have a workflow assigned in the workflow_mapping

Notes

The method performs the following steps: 1. Resets experiment results tracking 2. Creates a progress bar for monitoring 3. Processes each experiment in the queue 4. Handles individual experiment failures gracefully 5. Prints a summary of all experiment results 6. Exits early if all experiments failed 7. Generates HTML report if requested 8. Exports rerun configuration for reproducibility

Examples

>>> trainer = TrainingManager(metric_config, config_manager)
>>> 
>>> # Run experiments with report generation
>>> trainer.run_experiments(create_report=True)
>>> 
>>> # Run experiments without report generation
>>> trainer.run_experiments(create_report=False)