TrainingManager#

class TrainingManager(metric_config: MetricManager, config_manager: ConfigurationManager)[source]#

Manage the training and evaluation of machine learning models.

This class coordinates the entire lifecycle of machine learning experiments, from setup through execution to reporting. It manages model training across multiple datasets and algorithms, ensuring robust error handling and comprehensive result tracking.

The TrainingManager integrates with the broader Brisk ecosystem, utilizing services for logging, reporting, evaluation, and configuration management. It provides a centralized way to orchestrate complex machine learning workflows while maintaining detailed tracking and error handling.

Parameters:

metric_configMetricManager: Configuration for evaluation metrics and scoring
config_managerConfigurationManager: Instance containing all data and configuration needed to run experiments

Attributes:

servicesServiceBundle: Bundle of all available services (logging, reporting, I/O, etc.)
results_dirstr: Directory where experiment results are stored
metric_configMetricManager: Configuration for evaluation metrics and scoring
eval_managerEvaluationManager: Manager for handling model evaluation and metrics
data_managersdict: Maps group names to their corresponding data managers
experimentscollections.deque: Queue of experiments to run
logfilestr: Path to the configuration log file
output_structuredict: Structure of output data organization
description_mapdict: Mapping of names to descriptions
experiment_groupsdict: Mapping of experiment group names to their configurations
workflow_mappingdict: Maps experiment group names to their assigned workflow classes
experiment_pathsdefaultdict: Nested structure tracking experiment output paths
experiment_resultsdefaultdict: Stores results of all experiments with status and timing

Notes

The TrainingManager uses a workflow-based approach where different experiment groups can use different workflow classes. This allows for flexibility in handling different types of machine learning tasks (classification, regression, etc.) with appropriate workflows.

Error handling is comprehensive - individual experiment failures don’t stop the overall process, and detailed logging is maintained for debugging and analysis.

Examples

>>> from brisk.training.training_manager import TrainingManager
>>> from brisk.evaluation import metric_manager
>>> from brisk.configuration import configuration
>>> 
>>> # Create metric configuration
>>> metric_config = metric_manager.MetricManager()
>>> 
>>> # Create configuration manager with experiments
>>> config_manager = configuration.ConfigurationManager()
>>> 
>>> # Initialize training manager
>>> trainer = TrainingManager(metric_config, config_manager)
>>> 
>>> # Run all experiments with report generation
>>> trainer.run_experiments(create_report=True)

run_experiments(create_report: bool = True) → None[source]#

Run all experiments in the queue and generate a comprehensive report.

This method orchestrates the execution of all experiments in the queue, using the workflow_mapping to determine which workflow class to use for each experiment group. It provides comprehensive error handling and progress tracking throughout the process.

The method ensures that all experiments are attempted even if some fail, and provides detailed logging and result tracking. After all experiments complete, it can generate an HTML report summarizing the results.

Parameters:

create_reportbool, default=True: Whether to generate an HTML report after all experiments complete. If True, creates a comprehensive report with all results and visualizations.

Raises:

ValueError: If any experiment group does not have a workflow assigned in the workflow_mapping

Notes

The method performs the following steps: 1. Resets experiment results tracking 2. Creates a progress bar for monitoring 3. Processes each experiment in the queue 4. Handles individual experiment failures gracefully 5. Prints a summary of all experiment results 6. Exits early if all experiments failed 7. Generates HTML report if requested 8. Exports rerun configuration for reproducibility

Examples

>>> trainer = TrainingManager(metric_config, config_manager)
>>> 
>>> # Run experiments with report generation
>>> trainer.run_experiments(create_report=True)
>>> 
>>> # Run experiments without report generation
>>> trainer.run_experiments(create_report=False)

TrainingManager#

This Page