Common Measures#
- class EvaluateModel(method_name: str, description: str)[source]#
Bases:
MeasureEvaluatorEvaluate a model on the provided measures and save the results.
This evaluator calculates specified performance measures for a single trained model on a given dataset. It supports any metric that is configured in the metric configuration manager.
- Parameters:
- method_namestr
The name of the evaluator
- descriptionstr
The description of the evaluator output
- Attributes:
- method_namestr
The name of the evaluator
- descriptionstr
The description of the evaluator output
- servicesServiceBundle or None
The global services bundle
- metric_configMetricManager or None
The metric configuration manager
Notes
This evaluator provides a straightforward way to calculate performance measures for a single model. It uses the metric configuration manager to retrieve the appropriate metric functions and calculates scores for all specified metrics.
The evaluator supports both classification and regression metrics, depending on what is configured in the metric configuration manager.
Examples
- Use the model evaluation evaluator:
>>> from brisk.evaluation.evaluators import registry >>> evaluator = registry.get("brisk_evaluate_model") >>> evaluator.evaluate(model, X, y, ["accuracy", "f1_score"], "results")
- calculate_measures(predictions: Dict[str, Any], y_true: Series, metrics: List[str]) Dict[str, float][source]#
Calculate the evaluation results for a model.
Calculates the specified performance measures for the given predictions and true values using the configured metric functions.
- Parameters:
- predictionsDict[str, Any]
The predictions of the model (typically a pandas Series)
- y_truepd.Series
The true target values
- metricsList[str]
A list of metric names to calculate
- Returns:
- Dict[str, float]
A dictionary containing the evaluation results for each metric with display names as keys and scores as values
Notes
The method retrieves metric functions from the metric configuration manager and calculates scores for each specified metric. If a metric function is not found, it logs a warning and skips that metric.
The returned dictionary uses display names as keys for better readability in reports and logs.
- log_results(results: Dict[str, float], filename: str) None[source]#
Override default logging for model evaluation results.
Provides custom logging format for model evaluation results, showing each metric and its score in a readable format.
- Parameters:
- resultsDict[str, float]
The results of the evaluation
- filenamestr
The name of the file where results were saved
- Returns:
- None
Notes
The logging format shows each metric name and its score with 4 decimal places precision for numeric values. Non-numeric values are displayed as-is.
The metadata key is excluded from the logged results.
- report(results: Dict[str, Any]) Tuple[List[str], List[List[Any]]][source]#
Generate a report of the evaluation results.
Converts evaluation results into a format suitable for reporting with metric names and scores in a tabular format.
- Parameters:
- resultsDict[str, Any]
The results of the evaluation
- Returns:
- Tuple[List[str], List[List[Any]]]
A tuple containing: - List of column headers: [“Metric”, “Score”] - Nested list of rows with metric names and scores
Notes
The report format is designed for easy display in tables or reports, with one row per metric showing the metric name and its corresponding score.
The metadata key is excluded from the report.
- class EvaluateModelCV(method_name: str, description: str)[source]#
Bases:
MeasureEvaluatorEvaluate a model using cross-validation and save the scores.
This evaluator calculates performance measures for a model using cross-validation, providing more robust estimates of model performance by averaging scores across multiple train-test splits.
- Parameters:
- method_namestr
The name of the evaluator
- descriptionstr
The description of the evaluator output
- Attributes:
- method_namestr
The name of the evaluator
- descriptionstr
The description of the evaluator output
- servicesServiceBundle or None
The global services bundle
- metric_configMetricManager or None
The metric configuration manager
Notes
Cross-validation provides a more reliable estimate of model performance by reducing the variance associated with a single train-test split. The evaluator calculates mean scores, standard deviations, and stores all individual fold scores for detailed analysis.
The evaluator uses the utility service to get the appropriate cross-validation splitter based on the data characteristics.
Examples
- Use the cross-validation evaluator:
>>> from brisk.evaluation.evaluators import registry >>> evaluator = registry.get("brisk_evaluate_model_cv") >>> evaluator.evaluate( ... model, X, y, ["accuracy", "f1_score"], "cv_results", cv=5 ... )
- calculate_measures(model: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], cv: int = 5) Dict[str, float][source]#
Calculate the cross-validation results for a model.
Performs cross-validation evaluation for the specified metrics and returns comprehensive statistics including mean, standard deviation, and all individual fold scores.
- Parameters:
- modelbase.BaseEstimator
The model to evaluate
- Xpd.DataFrame
The input features for evaluation
- ypd.Series
The target data
- metricsList[str]
A list of metric names to calculate
- cvint, optional
The number of cross-validation folds, by default 5
- Returns:
- Dict[str, float]
A dictionary containing cross-validation results for each metric with display names as keys and statistics as values
Notes
The method uses scikit-learn’s cross_val_score function with the appropriate cross-validation splitter obtained from the utility service. The splitter is chosen based on data characteristics (e.g., stratified for classification, grouped if groups are specified).
Each metric result contains: - mean_score: Average score across all folds - std_dev: Standard deviation of scores - all_scores: List of all individual fold scores
- evaluate(model: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], filename: str, cv: int = 5) None[source]#
Evaluate a model using cross-validation and save the scores.
Executes the complete cross-validation evaluation workflow. This includes calculating scores across multiple folds, computing statistics, and saving the results with metadata.
- Parameters:
- modelbase.BaseEstimator
The model to evaluate
- Xpd.DataFrame
The input features for evaluation
- ypd.Series
The target data
- metricsList[str]
A list of metric names to calculate
- filenamestr
The name of the output file (without extension)
- cvint, optional
The number of cross-validation folds, by default 5
- Returns:
- None
Notes
The cross-validation process uses the utility service to get the appropriate splitter based on the data characteristics (e.g., stratified splits for classification, grouped splits if groups are specified).
Results include mean scores, standard deviations, and all individual fold scores for comprehensive analysis.
- log_results(results: Dict[str, float], filename: str) None[source]#
Override default logging for cross-validation results.
Provides custom logging format for cross-validation results, showing mean scores and standard deviations for each metric.
- Parameters:
- resultsDict[str, float]
The results of the cross-validation
- filenamestr
The name of the file where results were saved
- Returns:
- None
Notes
The logging format shows each metric with its mean score and standard deviation, providing a quick overview of model performance variability.
The metadata key is excluded from the logged results.
- report(results: Dict[str, Any]) Tuple[List[str], List[List[Any]]][source]#
Generate a report of the cross-validation results.
Converts cross-validation results into a format suitable for reporting with mean scores, standard deviations, and all scores.
- Parameters:
- resultsDict[str, Any]
The results of the cross-validation
- Returns:
- Tuple[List[str], List[List[Any]]]
A tuple containing: - List of column headers: [“Metric”, “Mean Score”, “All Scores”] - Nested list of rows with metric statistics
Notes
The report format shows mean scores with standard deviations in parentheses, and all individual fold scores for detailed analysis.
The metadata key is excluded from the report.
- class CompareModels(method_name: str, description: str)[source]#
Bases:
MeasureEvaluatorCompare multiple models using specified measures.
This evaluator allows comparison of multiple models on the same dataset using specified performance measures. It can optionally calculate differences between model performances for detailed analysis.
- Parameters:
- method_namestr
The name of the evaluator
- descriptionstr
The description of the evaluator output
- Attributes:
- method_namestr
The name of the evaluator
- descriptionstr
The description of the evaluator output
- servicesServiceBundle or None
The global services bundle
- metric_configMetricManager or None
The metric configuration manager
Notes
This evaluator is particularly useful for model selection and performance comparison. It can compare any number of models on the same dataset using the same metrics, ensuring fair comparison.
When calculate_diff is True, the evaluator calculates pairwise differences between all model pairs for each metric, providing detailed performance comparisons.
Examples
- Compare multiple models:
>>> from brisk.evaluation.evaluators import registry >>> evaluator = registry.get("brisk_compare_models") >>> evaluator.evaluate(model1, model2, model3, X=X, y=y, ... metrics=["accuracy", "f1_score"], ... filename="comparison", calculate_diff=True)
- calculate_measures(*models: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], calculate_diff: bool = False) Dict[str, Dict[str, float]][source]#
Calculate the comparison results for multiple models.
Evaluates each model on the specified metrics and optionally calculates pairwise differences between models.
- Parameters:
- *modelsbase.BaseEstimator
Models to compare (variable number of arguments)
- Xpd.DataFrame
Input features for evaluation
- ypd.Series
Target values for evaluation
- metricsList[str]
Names of metrics to calculate
- calculate_diffbool, optional
Whether to calculate differences between models, by default False
- Returns:
- Dict[str, Dict[str, float]]
Nested dictionary containing metric scores for each model and optionally pairwise differences
- Raises:
- ValueError
If no models are provided for comparison
Notes
The method evaluates each model individually and stores results in a nested dictionary structure. If calculate_diff is True, it also calculates pairwise differences between all model pairs for each metric.
The differences are calculated as model_b - model_a for each pair, showing the performance improvement (or degradation) when switching from model_a to model_b.
- evaluate(*models: BaseEstimator, X: DataFrame, y: Series, metrics: List[str], filename: str, calculate_diff: bool = False) None[source]#
Compare multiple models using specified metrics.
Executes the complete model comparison workflow. This includes evaluating each model on the specified metrics and optionally calculating pairwise differences between models.
- Parameters:
- *modelsbase.BaseEstimator
Models to compare (variable number of arguments)
- Xpd.DataFrame
Input features for evaluation
- ypd.Series
Target values for evaluation
- metricsList[str]
Names of metrics to calculate
- filenamestr
Name for output file (without extension)
- calculate_diffbool, optional
Whether to calculate differences between models, by default False
- Returns:
- None
Notes
The method evaluates each model individually on the same dataset using the same metrics, ensuring fair comparison. If calculate_diff is True, it also calculates pairwise differences between all model pairs for each metric.
Results are saved with metadata for later analysis and reporting.
- log_results(results: Dict[str, float], filename: str) None[source]#
Override default logging for model comparison results.
Provides custom logging format for model comparison results, showing each model’s performance on each metric.
- Parameters:
- resultsDict[str, float]
The results of the model comparison
- filenamestr
The name of the file where results were saved
- Returns:
- None
Notes
The logging format shows each model’s performance on each metric, making it easy to compare model performance at a glance.
The differences and metadata keys are excluded from the logged results.
- report(results: Dict[str, Any]) Tuple[List[str], List[List[Any]]][source]#
Generate a report of the model comparison results.
Converts model comparison results into a tabular format suitable for reporting, with models as columns and metrics as rows.
- Parameters:
- resultsDict[str, Any]
The results of the model comparison
- Returns:
- Tuple[List[str], List[List[Any]]]
A tuple containing: - List of column headers: [“Metric”, model1_name, model2_name, …] - Nested list of rows with metric names and scores for each model