Common Measures#

class EvaluateModel(method_name: str, description: str)[source]#

Bases: MeasureEvaluator

Evaluate a model on the provided measures and save the results.

This evaluator calculates specified performance measures for a single trained model on a given dataset. It supports any metric that is configured in the metric configuration manager.

Parameters:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output

Attributes:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output
servicesServiceBundle or None: The global services bundle
metric_configMetricManager or None: The metric configuration manager

Notes

This evaluator provides a straightforward way to calculate performance measures for a single model. It uses the metric configuration manager to retrieve the appropriate metric functions and calculates scores for all specified metrics.

The evaluator supports both classification and regression metrics, depending on what is configured in the metric configuration manager.

Examples

Use the model evaluation evaluator:

>>> from brisk.evaluation.evaluators import registry
>>> evaluator = registry.get("brisk_evaluate_model")
>>> evaluator.evaluate(model, X, y, ["accuracy", "f1_score"], "results")

report(results: Dict[str, Any]) → Tuple[List[str], List[List[Any]]][source]#

Generate a report of the evaluation results.

Converts evaluation results into a format suitable for reporting with metric names and scores in a tabular format.

Parameters:

resultsDict[str, Any]: The results of the evaluation

Returns:

Tuple[List[str], List[List[Any]]]: A tuple containing: - List of column headers: [“Metric”, “Score”] - Nested list of rows with metric names and scores

Notes

The report format is designed for easy display in tables or reports, with one row per metric showing the metric name and its corresponding score.

The metadata key is excluded from the report.

class EvaluateModelCV(method_name: str, description: str)[source]#

Bases: MeasureEvaluator

Evaluate a model using cross-validation and save the scores.

This evaluator calculates performance measures for a model using cross-validation, providing more robust estimates of model performance by averaging scores across multiple train-test splits.

Parameters:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output

Attributes:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output
servicesServiceBundle or None: The global services bundle
metric_configMetricManager or None: The metric configuration manager

Notes

Cross-validation provides a more reliable estimate of model performance by reducing the variance associated with a single train-test split. The evaluator calculates mean scores, standard deviations, and stores all individual fold scores for detailed analysis.

The evaluator uses the utility service to get the appropriate cross-validation splitter based on the data characteristics.

Examples

Use the cross-validation evaluator:

>>> from brisk.evaluation.evaluators import registry
>>> evaluator = registry.get("brisk_evaluate_model_cv")
>>> evaluator.evaluate(
...     model, X, y, ["accuracy", "f1_score"], "cv_results", cv=5
... )

evaluate(model: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], filename: str, cv: int = 5) → None[source]#

Evaluate a model using cross-validation and save the scores.

Executes the complete cross-validation evaluation workflow. This includes calculating scores across multiple folds, computing statistics, and saving the results with metadata.

Parameters:

modelbase.BaseEstimator: The model to evaluate
Xpd.DataFrame: The input features for evaluation
ypd.Series: The target data
metricsList[str]: A list of metric names to calculate
filenamestr: The name of the output file (without extension)
cvint, optional: The number of cross-validation folds, by default 5

Returns:

None

Notes

The cross-validation process uses the utility service to get the appropriate splitter based on the data characteristics (e.g., stratified splits for classification, grouped splits if groups are specified).

Results include mean scores, standard deviations, and all individual fold scores for comprehensive analysis.

report(results: Dict[str, Any]) → Tuple[List[str], List[List[Any]]][source]#

Generate a report of the cross-validation results.

Converts cross-validation results into a format suitable for reporting with mean scores, standard deviations, and all scores.

Parameters:

resultsDict[str, Any]: The results of the cross-validation

Returns:

Tuple[List[str], List[List[Any]]]: A tuple containing: - List of column headers: [“Metric”, “Mean Score”, “All Scores”] - Nested list of rows with metric statistics

Notes

The report format shows mean scores with standard deviations in parentheses, and all individual fold scores for detailed analysis.

The metadata key is excluded from the report.

class CompareModels(method_name: str, description: str)[source]#

Bases: MeasureEvaluator

Compare multiple models using specified measures.

This evaluator allows comparison of multiple models on the same dataset using specified performance measures. It can optionally calculate differences between model performances for detailed analysis.

Parameters:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output

Attributes:

method_namestr: The name of the evaluator
descriptionstr: The description of the evaluator output
servicesServiceBundle or None: The global services bundle
metric_configMetricManager or None: The metric configuration manager

Notes

This evaluator is particularly useful for model selection and performance comparison. It can compare any number of models on the same dataset using the same metrics, ensuring fair comparison.

When calculate_diff is True, the evaluator calculates pairwise differences between all model pairs for each metric, providing detailed performance comparisons.

Examples

Compare multiple models:

>>> from brisk.evaluation.evaluators import registry
>>> evaluator = registry.get("brisk_compare_models")
>>> evaluator.evaluate(model1, model2, model3, X=X, y=y, 
...                   metrics=["accuracy", "f1_score"], 
...                   filename="comparison", calculate_diff=True)

evaluate(*models: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], filename: str, calculate_diff: bool = False) → None[source]#

Compare multiple models using specified metrics.

Executes the complete model comparison workflow. This includes evaluating each model on the specified metrics and optionally calculating pairwise differences between models.

Parameters:

*modelsbase.BaseEstimator: Models to compare (variable number of arguments)
Xpd.DataFrame: Input features for evaluation
ypd.Series: Target values for evaluation
metricsList[str]: Names of metrics to calculate
filenamestr: Name for output file (without extension)
calculate_diffbool, optional: Whether to calculate differences between models, by default False

Returns:

None

Notes

The method evaluates each model individually on the same dataset using the same metrics, ensuring fair comparison. If calculate_diff is True, it also calculates pairwise differences between all model pairs for each metric.

Results are saved with metadata for later analysis and reporting.

Common Measures#

This Page