EvaluationManager#

class EvaluationManager(algorithm_config: AlgorithmCollection, metric_config: MetricManager, output_dir: str, split_metadata: Dict[str, Any], logger: Logger | None = None)#

A class for evaluating machine learning models and plotting results.

This class provides methods for model evaluation, including calculating metrics, generating plots, comparing models, and hyperparameter tuning. It is designed to be used within a Workflow instance.

Parameters:
algorithm_configAlgorithmCollection

Configuration for algorithms.

metric_configMetricManager

Configuration for evaluation metrics.

output_dirstr

Directory to save results.

split_metadataDict[str, Any]

Metadata to include in metric calculations.

loggerOptional[logging.Logger]

Logger instance to use.

Attributes:
algorithm_configAlgorithmCollection

Configuration for algorithms.

metric_configAny

Configuration for evaluation metrics.

output_dirstr

Directory to save results.

split_metadataDict[str, Any]

Metadata to include in metric calculations.

loggerOptional[logging.Logger]

Logger instance to use.

primary_colorstr

Color for primary elements.

secondary_colorstr

Color for secondary elements.

background_colorstr

Color for background elements.

accent_colorstr

Color for accent elements.

important_colorstr

Color for important elements.

compare_models(*models: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], filename: str, calculate_diff: bool = False) Dict[str, Dict[str, float]]#

Compare multiple models using specified metrics.

Parameters:
*modelsBaseEstimator

Models to compare

XDataFrame

Input features

ySeries

Target values

metricslist of str

Names of metrics to calculate

filenamestr

Name for output file (without extension)

calculate_diffbool, optional

Whether to calculate differences between models, by default False

Returns:
dict

Nested dictionary containing metric scores for each model

confusion_matrix(model: Any, X: ndarray, y: ndarray, filename: str) None#

Generate and save a confusion matrix.

Parameters:
modelAny

Trained classification model with predict method

Xndarray

The input features.

yndarray

The true target values.

filenamestr

The name of the output file (without extension).

evaluate_model(model: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], filename: str) None#

Evaluate a model on the provided metrics and save the results.

Parameters:
model (BaseEstimator):

The trained model to evaluate.

X (pd.DataFrame):

The input features.

y (pd.Series):

The target data.

metrics (List[str]):

A list of metrics to calculate.

filename (str):

The name of the output file without extension.

evaluate_model_cv(model: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], filename: str, cv: int = 5) None#

Evaluate a model using cross-validation and save the scores.

Parameters:
model (BaseEstimator):

The model to evaluate.

X (pd.DataFrame):

The input features.

y (pd.Series):

The target data.

metrics (List[str]):

A list of metrics to calculate.

filename (str):

The name of the output file without extension.

cv (int):

The number of cross-validation folds. Defaults to 5.

hyperparameter_tuning(model: BaseEstimator, method: str, X_train: DataFrame, y_train: Series, scorer: str, kf: int, num_rep: int, n_jobs: int, plot_results: bool = False) BaseEstimator#

Perform hyperparameter tuning using grid or random search.

Parameters:
model (BaseEstimator):

The model to be tuned.

method (str):

The search method to use (“grid” or “random”).

X_train (pd.DataFrame):

The training data.

y_train (pd.Series):

The target values for training.

scorer (str):

The scoring metric to use.

kf (int):

Number of splits for cross-validation.

num_rep (int):

Number of repetitions for cross-validation.

n_jobs (int):

Number of parallel jobs to run.

plot_results (bool):

Whether to plot the performance of hyperparameters. Defaults to False.

Returns:
BaseEstimator:

The tuned model.

load_model(filepath: str) BaseEstimator#

Load model from pickle file.

Parameters:
filepathstr

Path to saved model file

Returns:
BaseEstimator

Loaded model

Raises:
FileNotFoundError

If model file does not exist

plot_confusion_heatmap(model: Any, X: ndarray, y: ndarray, filename: str) None#

Plot a heatmap of the confusion matrix for a model.

Parameters:
model (Any):

The trained classification model with a predict method.

X (np.ndarray):

The input features.

y (np.ndarray):

The target labels.

filename (str):

The path to save the confusion matrix heatmap image.

plot_feature_importance(model: BaseEstimator, X: DataFrame, y: Series, threshold: int | float, feature_names: List[str], filename: str, metric: str, num_rep: int) None#

Plot the feature importance for the model and save the plot.

Parameters:
model (BaseEstimator):

The model to evaluate.

X (pd.DataFrame):

The input features.

y (pd.Series):

The target data.

threshold (Union[int, float]):

The number of features or the threshold to filter features by importance.

feature_names (List[str]):

A list of feature names corresponding to the columns in X.

filename (str):

The name of the output file (without extension).

metric (str):

The metric to use for evaluation.

num_rep (int):

The number of repetitions for calculating importance.

plot_learning_curve(model: BaseEstimator, X_train: DataFrame, y_train: Series, cv: int = 5, num_repeats: int = 1, n_jobs: int = -1, metric: str = 'neg_mean_absolute_error', filename: str = 'learning_curve') None#

Plot learning curves showing model performance vs training size.

Parameters:
modelBaseEstimator

Model to evaluate

X_trainDataFrame

Training features

y_trainSeries

Training target values

cvint, optional

Number of cross-validation folds, by default 5

num_repeatsint, optional

Number of times to repeat CV, by default 1

n_jobsint, optional

Number of parallel jobs, by default -1

metricstr, optional

Scoring metric to use, by default “neg_mean_absolute_error”

filenamestr, optional

Name for output file, by default “learning_curve”

plot_model_comparison(*models: BaseEstimator, X: DataFrame, y: Series, metric: str, filename: str) None#

Plot a comparison of multiple models based on the specified metric.

Parameters:
models:

A variable number of model instances to evaluate.

X (pd.DataFrame):

The input features.

y (pd.Series):

The target data.

metric (str):

The metric to evaluate and plot.

filename (str):

The name of the output file (without extension).

plot_precision_recall_curve(model: Any, X: ndarray, y: ndarray, filename: str) None#

Plot a precision-recall curve with average precision.

Parameters:
model (Any):

The trained binary classification model.

X (np.ndarray):

The input features.

y (np.ndarray):

The true binary labels.

filename (str):

The path to save the plot.

plot_pred_vs_obs(model: BaseEstimator, X: DataFrame, y_true: Series, filename: str) None#

Plot predicted vs. observed values and save the plot.

Parameters:
model (BaseEstimator):

The trained model.

X (pd.DataFrame):

The input features.

y_true (pd.Series):

The true target values.

filename (str):

The name of the output file (without extension).

plot_residuals(model: BaseEstimator, X: DataFrame, y: Series, filename: str, add_fit_line: bool = False) None#

Plot the residuals of the model and save the plot.

Parameters:
model (BaseEstimator):

The trained model.

X (pd.DataFrame):

The input features.

y (pd.Series):

The true target values.

filename (str):

The name of the output file (without extension).

add_fit_line (bool):

Whether to add a line of best fit to the plot.

plot_roc_curve(model: Any, X: ndarray, y: ndarray, filename: str) None#

Plot a reciever operator curve with area under the curve.

Parameters:
model (Any):

The trained binary classification model.

X (np.ndarray):

The input features.

y (np.ndarray):

The true binary labels.

filename (str):

The path to save the ROC curve image.

save_model(model: BaseEstimator, filename: str) None#

Save model to pickle file.

Parameters:
model (BaseEstimator):

The model to save.

filename (str):

The name for the output file (without extension).