Common Measures#

class EvaluateModel(method_name: str, description: str)[source]#

Bases: MeasureEvaluator

Evaluate a model on the provided measures and save the results.

This evaluator calculates specified performance measures for a single trained model on a given dataset. It supports any metric that is configured in the metric configuration manager.

Parameters:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output

Attributes:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output
servicesServiceBundle or None: The global services bundle
metric_configMetricManager or None: The metric configuration manager

Notes

This evaluator provides a straightforward way to calculate performance measures for a single model. It uses the metric configuration manager to retrieve the appropriate metric functions and calculates scores for all specified metrics.

The evaluator supports both classification and regression metrics, depending on what is configured in the metric configuration manager.

Examples

Use the model evaluation evaluator:

>>> from brisk.evaluation.evaluators import registry
>>> evaluator = registry.get("brisk_evaluate_model")
>>> evaluator.evaluate(model, X, y, ["accuracy", "f1_score"], "results")

calculate_measures(predictions: Dict[str, Any], y_true: Series, metrics: List[str]) → Dict[str, float][source]#

Calculate the evaluation results for a model.

Calculates the specified performance measures for the given predictions and true values using the configured metric functions.

Parameters:

predictionsDict[str, Any]: The predictions of the model (typically a pandas Series)
y_truepd.Series: The true target values
metricsList[str]: A list of metric names to calculate

Returns:

Dict[str, float]: A dictionary containing the evaluation results for each metric with display names as keys and scores as values

Notes

The method retrieves metric functions from the metric configuration manager and calculates scores for each specified metric. If a metric function is not found, it logs a warning and skips that metric.

The returned dictionary uses display names as keys for better readability in reports and logs.

log_results(results: Dict[str, float], filename: str) → None[source]#

Override default logging for model evaluation results.

Provides custom logging format for model evaluation results, showing each metric and its score in a readable format.

Parameters:

resultsDict[str, float]: The results of the evaluation
filenamestr: The name of the file where results were saved

Returns:

None

Notes

The logging format shows each metric name and its score with 4 decimal places precision for numeric values. Non-numeric values are displayed as-is.

The metadata key is excluded from the logged results.

report(results: Dict[str, Any]) → Tuple[List[str], List[List[Any]]][source]#

Generate a report of the evaluation results.

Converts evaluation results into a format suitable for reporting with metric names and scores in a tabular format.

Parameters:

resultsDict[str, Any]: The results of the evaluation

Returns:

Tuple[List[str], List[List[Any]]]: A tuple containing: - List of column headers: [“Metric”, “Score”] - Nested list of rows with metric names and scores

Notes

The report format is designed for easy display in tables or reports, with one row per metric showing the metric name and its corresponding score.

The metadata key is excluded from the report.

class EvaluateModelCV(method_name: str, description: str)[source]#

Bases: MeasureEvaluator

Evaluate a model using cross-validation and save the scores.

This evaluator calculates performance measures for a model using cross-validation, providing more robust estimates of model performance by averaging scores across multiple train-test splits.

Parameters:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output

Attributes:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output
servicesServiceBundle or None: The global services bundle
metric_configMetricManager or None: The metric configuration manager

Notes

Cross-validation provides a more reliable estimate of model performance by reducing the variance associated with a single train-test split. The evaluator calculates mean scores, standard deviations, and stores all individual fold scores for detailed analysis.

The evaluator uses the utility service to get the appropriate cross-validation splitter based on the data characteristics.

Examples

Use the cross-validation evaluator:

>>> from brisk.evaluation.evaluators import registry
>>> evaluator = registry.get("brisk_evaluate_model_cv")
>>> evaluator.evaluate(
...     model, X, y, ["accuracy", "f1_score"], "cv_results", cv=5
... )

calculate_measures(model: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], cv: int = 5) → Dict[str, float][source]#

Calculate the cross-validation results for a model.

Performs cross-validation evaluation for the specified metrics and returns comprehensive statistics including mean, standard deviation, and all individual fold scores.

Parameters:

modelbase.BaseEstimator: The model to evaluate
Xpd.DataFrame: The input features for evaluation
ypd.Series: The target data
metricsList[str]: A list of metric names to calculate
cvint, optional: The number of cross-validation folds, by default 5

Returns:

Dict[str, float]: A dictionary containing cross-validation results for each metric with display names as keys and statistics as values

Notes

The method uses scikit-learn’s cross_val_score function with the appropriate cross-validation splitter obtained from the utility service. The splitter is chosen based on data characteristics (e.g., stratified for classification, grouped if groups are specified).

Each metric result contains: - mean_score: Average score across all folds - std_dev: Standard deviation of scores - all_scores: List of all individual fold scores

evaluate(model: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], filename: str, cv: int = 5) → None[source]#

Evaluate a model using cross-validation and save the scores.

Executes the complete cross-validation evaluation workflow. This includes calculating scores across multiple folds, computing statistics, and saving the results with metadata.

Parameters:

modelbase.BaseEstimator: The model to evaluate
Xpd.DataFrame: The input features for evaluation
ypd.Series: The target data
metricsList[str]: A list of metric names to calculate
filenamestr: The name of the output file (without extension)
cvint, optional: The number of cross-validation folds, by default 5

Returns:

None

Notes

The cross-validation process uses the utility service to get the appropriate splitter based on the data characteristics (e.g., stratified splits for classification, grouped splits if groups are specified).

Results include mean scores, standard deviations, and all individual fold scores for comprehensive analysis.

log_results(results: Dict[str, float], filename: str) → None[source]#

Override default logging for cross-validation results.

Provides custom logging format for cross-validation results, showing mean scores and standard deviations for each metric.

Parameters:

resultsDict[str, float]: The results of the cross-validation
filenamestr: The name of the file where results were saved

Returns:

None

Notes

The logging format shows each metric with its mean score and standard deviation, providing a quick overview of model performance variability.

The metadata key is excluded from the logged results.

report(results: Dict[str, Any]) → Tuple[List[str], List[List[Any]]][source]#

Generate a report of the cross-validation results.

Converts cross-validation results into a format suitable for reporting with mean scores, standard deviations, and all scores.

Parameters:

resultsDict[str, Any]: The results of the cross-validation

Returns:

Tuple[List[str], List[List[Any]]]: A tuple containing: - List of column headers: [“Metric”, “Mean Score”, “All Scores”] - Nested list of rows with metric statistics

Notes

The report format shows mean scores with standard deviations in parentheses, and all individual fold scores for detailed analysis.

The metadata key is excluded from the report.

class CompareModels(method_name: str, description: str)[source]#

Bases: MeasureEvaluator

Compare multiple models using specified measures.

This evaluator allows comparison of multiple models on the same dataset using specified performance measures. It can optionally calculate differences between model performances for detailed analysis.

Parameters:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output

Attributes:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output
servicesServiceBundle or None: The global services bundle
metric_configMetricManager or None: The metric configuration manager

Notes

This evaluator is particularly useful for model selection and performance comparison. It can compare any number of models on the same dataset using the same metrics, ensuring fair comparison.

When calculate_diff is True, the evaluator calculates pairwise differences between all model pairs for each metric, providing detailed performance comparisons.

Examples

Compare multiple models:

>>> from brisk.evaluation.evaluators import registry
>>> evaluator = registry.get("brisk_compare_models")
>>> evaluator.evaluate(model1, model2, model3, X=X, y=y, 
...                   metrics=["accuracy", "f1_score"], 
...                   filename="comparison", calculate_diff=True)

calculate_measures(*models: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], calculate_diff: bool = False) → Dict[str, Dict[str, float]][source]#

Calculate the comparison results for multiple models.

Evaluates each model on the specified metrics and optionally calculates pairwise differences between models.

Parameters:

*modelsbase.BaseEstimator: Models to compare (variable number of arguments)
Xpd.DataFrame: Input features for evaluation
ypd.Series: Target values for evaluation
metricsList[str]: Names of metrics to calculate
calculate_diffbool, optional: Whether to calculate differences between models, by default False

Returns:

Dict[str, Dict[str, float]]: Nested dictionary containing metric scores for each model and optionally pairwise differences

Raises:

ValueError: If no models are provided for comparison

Notes

The method evaluates each model individually and stores results in a nested dictionary structure. If calculate_diff is True, it also calculates pairwise differences between all model pairs for each metric.

The differences are calculated as model_b - model_a for each pair, showing the performance improvement (or degradation) when switching from model_a to model_b.

evaluate(*models: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], filename: str, calculate_diff: bool = False) → None[source]#

Compare multiple models using specified metrics.

Executes the complete model comparison workflow. This includes evaluating each model on the specified metrics and optionally calculating pairwise differences between models.

Parameters:

*modelsbase.BaseEstimator: Models to compare (variable number of arguments)
Xpd.DataFrame: Input features for evaluation
ypd.Series: Target values for evaluation
metricsList[str]: Names of metrics to calculate
filenamestr: Name for output file (without extension)
calculate_diffbool, optional: Whether to calculate differences between models, by default False

Returns:

None

Notes

The method evaluates each model individually on the same dataset using the same metrics, ensuring fair comparison. If calculate_diff is True, it also calculates pairwise differences between all model pairs for each metric.

Results are saved with metadata for later analysis and reporting.

log_results(results: Dict[str, float], filename: str) → None[source]#

Override default logging for model comparison results.

Provides custom logging format for model comparison results, showing each model’s performance on each metric.

Parameters:

resultsDict[str, float]: The results of the model comparison
filenamestr: The name of the file where results were saved

Returns:

None

Notes

The logging format shows each model’s performance on each metric, making it easy to compare model performance at a glance.

The differences and metadata keys are excluded from the logged results.

report(results: Dict[str, Any]) → Tuple[List[str], List[List[Any]]][source]#

Generate a report of the model comparison results.

Converts model comparison results into a tabular format suitable for reporting, with models as columns and metrics as rows.

Parameters:

resultsDict[str, Any]: The results of the model comparison

Returns:

Tuple[List[str], List[List[Any]]]: A tuple containing: - List of column headers: [“Metric”, model1_name, model2_name, …] - Nested list of rows with metric names and scores for each model

Common Measures#

This Page