TrainingManager#
- class TrainingManager(metric_config: MetricManager, config_manager: ConfigurationManager)[source]#
Manage the training and evaluation of machine learning models.
This class coordinates the entire lifecycle of machine learning experiments, from setup through execution to reporting. It manages model training across multiple datasets and algorithms, ensuring robust error handling and comprehensive result tracking.
The TrainingManager integrates with the broader Brisk ecosystem, utilizing services for logging, reporting, evaluation, and configuration management. It provides a centralized way to orchestrate complex machine learning workflows while maintaining detailed tracking and error handling.
- Parameters:
- metric_configMetricManager
Configuration for evaluation metrics and scoring
- config_managerConfigurationManager
Instance containing all data and configuration needed to run experiments
- Attributes:
- servicesServiceBundle
Bundle of all available services (logging, reporting, I/O, etc.)
- results_dirstr
Directory where experiment results are stored
- metric_configMetricManager
Configuration for evaluation metrics and scoring
- eval_managerEvaluationManager
Manager for handling model evaluation and metrics
- data_managersdict
Maps group names to their corresponding data managers
- experimentscollections.deque
Queue of experiments to run
- logfilestr
Path to the configuration log file
- output_structuredict
Structure of output data organization
- description_mapdict
Mapping of names to descriptions
- experiment_groupsdict
Mapping of experiment group names to their configurations
- workflow_mappingdict
Maps experiment group names to their assigned workflow classes
- experiment_pathsdefaultdict
Nested structure tracking experiment output paths
- experiment_resultsdefaultdict
Stores results of all experiments with status and timing
Notes
The TrainingManager uses a workflow-based approach where different experiment groups can use different workflow classes. This allows for flexibility in handling different types of machine learning tasks (classification, regression, etc.) with appropriate workflows.
Error handling is comprehensive - individual experiment failures don’t stop the overall process, and detailed logging is maintained for debugging and analysis.
Examples
>>> from brisk.training.training_manager import TrainingManager >>> from brisk.evaluation import metric_manager >>> from brisk.configuration import configuration >>> >>> # Create metric configuration >>> metric_config = metric_manager.MetricManager() >>> >>> # Create configuration manager with experiments >>> config_manager = configuration.ConfigurationManager() >>> >>> # Initialize training manager >>> trainer = TrainingManager(metric_config, config_manager) >>> >>> # Run all experiments with report generation >>> trainer.run_experiments(create_report=True)
- run_experiments(create_report: bool = True) None[source]#
Run all experiments in the queue and generate a comprehensive report.
This method orchestrates the execution of all experiments in the queue, using the workflow_mapping to determine which workflow class to use for each experiment group. It provides comprehensive error handling and progress tracking throughout the process.
The method ensures that all experiments are attempted even if some fail, and provides detailed logging and result tracking. After all experiments complete, it can generate an HTML report summarizing the results.
- Parameters:
- create_reportbool, default=True
Whether to generate an HTML report after all experiments complete. If True, creates a comprehensive report with all results and visualizations.
- Raises:
- ValueError
If any experiment group does not have a workflow assigned in the workflow_mapping
Notes
The method performs the following steps: 1. Resets experiment results tracking 2. Creates a progress bar for monitoring 3. Processes each experiment in the queue 4. Handles individual experiment failures gracefully 5. Prints a summary of all experiment results 6. Exits early if all experiments failed 7. Generates HTML report if requested 8. Exports rerun configuration for reproducibility
Examples
>>> trainer = TrainingManager(metric_config, config_manager) >>> >>> # Run experiments with report generation >>> trainer.run_experiments(create_report=True) >>> >>> # Run experiments without report generation >>> trainer.run_experiments(create_report=False)