Configuration#
- class Configuration(default_workflow: str, default_algorithms: List[str], categorical_features: Dict[str, List[str]] | None = None, default_workflow_args: Dict[str, Any] | None = None, plot_settings: PlotSettings | None = None)[source]#
User interface for defining experiment configurations.
This class provides a simple interface for users to define experiment groups and their configurations. It handles default values, ensures unique group names, and provides validation for configuration parameters.
- Parameters:
- default_workflowstr
Default workflow name to use for experiment groups
- default_algorithmsList[str]
List of algorithm names to use as defaults when none specified
- categorical_featuresDict[str, List[str]], optional
Dictionary mapping dataset identifiers to lists of categorical feature names, by default None
- default_workflow_argsDict[str, Any], optional
Default values to assign as attributes of the Workflow class, by default None
- plot_settingsPlotSettings, optional
Plot configuration settings, by default None
- Attributes:
- default_workflowstr
Default workflow name for experiment groups
- experiment_groupsList[ExperimentGroup]
List of configured experiment groups
- default_algorithmsList[str]
List of default algorithm names
- categorical_featuresDict[str, List[str]]
Mapping of dataset identifiers to categorical feature lists
- default_workflow_argsDict[str, Any]
Default workflow arguments
- plot_settingsPlotSettings
Plot configuration settings
Notes
The Configuration class serves as the main user interface for setting up experiments. It provides a fluent API for adding experiment groups and automatically handles validation and default value assignment.
Examples
- Create a simple configuration:
>>> config = Configuration( ... default_workflow="workflow", ... default_algorithms=["linear", "ridge"] ... )
- Add experiment groups:
>>> config.add_experiment_group( ... name="baseline", ... datasets=["data1.csv", "data2.csv"], ... algorithms=["linear", "svm"] ... )
- Build the configuration manager:
>>> manager = config.build()
- add_experiment_group(*, name: str, datasets: List[str | Tuple[str, str]], data_config: Dict[str, Any] | None = None, algorithms: List[str] | None = None, algorithm_config: Dict[str, Dict[str, Any]] | None = None, description: str | None = '', workflow: str | None = None, workflow_args: Dict[str, Any] | None = None) None[source]#
Add a new ExperimentGroup to the configuration.
Adds a new experiment group with the specified parameters. Validates the group name uniqueness and dataset format before adding.
- Parameters:
- namestr
Unique identifier for the experiment group
- datasetsList[str | Tuple[str, str]]
List of dataset paths relative to datasets directory. Can be strings (dataset files) or tuples of (dataset_file, table_name) for multi-table databases
- data_configDict[str, Any], optional
Arguments for DataManager used by this experiment group, by default None
- algorithmsList[str], optional
List of algorithm names to use. If None, uses default_algorithms, by default None
- algorithm_configDict[str, Dict[str, Any]], optional
Algorithm-specific configurations that override values set in algorithms.py, by default None
- descriptionstr, optional
Human-readable description for the experiment group, by default “”
- workflowstr, optional
Name of the workflow file to use (without .py extension). If None, uses default_workflow, by default None
- workflow_argsDict[str, Any], optional
Values to assign as attributes in the Workflow class. Must have same keys as default_workflow_args, by default None
- Raises:
- ValueError
If group name already exists or workflow_args keys don’t match default_workflow_args
- TypeError
If datasets contains invalid types (must be strings or tuples)
Notes
The method performs several validation checks: 1. Ensures group name is unique 2. Validates dataset format (strings or tuples of strings) 3. Validates workflow_args keys match default_workflow_args 4. Converts string datasets to (dataset, None) tuples
Examples
- Add a simple experiment group:
>>> config.add_experiment_group( ... name="baseline", ... datasets=["data.csv"] ... )
- Add group with custom settings:
>>> config.add_experiment_group( ... name="advanced", ... datasets=[("data.xlsx", "Sheet1"), "data2.csv"], ... algorithms=["svm", "rf"], ... data_config={"test_size": 0.3}, ... description="Advanced experiment with custom settings" ... )
- build() ConfigurationManager[source]#
Build and return a ConfigurationManager instance.
Processes all experiment groups and creates a ConfigurationManager that can execute the experiments. Exports configuration parameters for rerun functionality.
- Returns:
- ConfigurationManager
Fully configured manager ready to execute experiments
Notes
The build process: 1. Exports configuration parameters for rerun functionality 2. Creates a ConfigurationManager with all experiment groups 3. Sets up data managers, algorithm configurations, and workflows 4. Prepares the complete experiment execution environment
Examples
- Build and use the configuration:
>>> config = Configuration("workflow", ["linear", "ridge"]) >>> config.add_experiment_group(name="test", datasets=["data.csv"]) >>> manager = config.build() >>> # manager is ready to execute experiments
- export_params() None[source]#
Export configuration parameters for rerun functionality.
Serializes the current configuration to a format that can be used to recreate the experiment setup during rerun operations. This includes all experiment groups, categorical features, and plot settings.
Notes
The exported parameters include: - Default workflow and algorithms - Categorical features mapping - Plot settings configuration - All experiment group configurations - Dataset metadata for validation
This data is used by the rerun system to ensure experiments can be reproduced with identical configurations.
Examples
- Export is called automatically during build():
>>> config = Configuration("workflow", ["linear"]) >>> config.add_experiment_group(name="test", datasets=["data.csv"]) >>> manager = config.build() # export_params() called automatically
- set_services(services: ServiceBundle | None = None)[source]#