Creating Custom Evaluators#
You can create custom plots and evaluation methods beyond the built-in evaluators by
defining them in your project’s evaluators.py file. Custom evaluators integrate
with Brisk’s evaluation system and appear in the interactive report.
Built-in vs Custom Evaluators#
Built-in Evaluators are provided by Brisk and include common plots (learning curves, feature importance, SHAP values) and evaluation methods (cross-validation, model comparison).
Custom Evaluators are methods you add to analyze your models beyond Brisk’s built-in evaluations. They allow you to create specialized visualizations, or add any custom analysis logic your project needs.
Types of Custom Evaluators#
Measure Evaluators (MeasureEvaluator):
Calculate numerical metrics and store results as JSON files.
Plot Evaluators (PlotEvaluator):
Generate visualizations and saves them as image files.
Creating Custom Evaluators#
Custom evaluators are defined in your project’s evaluators.py file. There are two types you can create:
Custom Measure Evaluators#
To implement a custom measure evaluator define a new class
in evaluators.py and implement the _calculate_measures method. The default
arguments for this method are:
predictions: the model predictions
y_true: the true target values
metrics: the list of metric names to calculate
The method should return a dictionary of the calculated values.
Note
To access the metric scorer callable you can call self.metric_config.get_metric(metric_name).
To access the metric display name call self.metric_config.get_name(metric_name).
Here is an example of a custom measure evaluator:
# evaluators.py
from brisk.evaluation.evaluators import MeasureEvaluator
import pandas as pd
from typing import Dict, Any
class ExampleMeasureEvaluator(MeasureEvaluator):
def _calculate_measures(self, predictions, y_true, metrics) -> Dict[str, Any]:
"""Calculate prediction summary statistics."""
results = {}
for metric_name in metrics:
scorer = self.metric_config.get_metric(metric_name)
display_name = self.metric_config.get_name(metric_name)
metric_value = scorer(y_true, predictions)
results[display_name] = float(metric_value)
return results
Note
If these arguments are not suitable for your evaluator you can override the evaluate method.
The default evaluate method is:
def evaluate(self, model, X, y, metrics, filename):
predictions = self._generate_prediction(model, X)
results = self._calculate_measures(predictions, y, metrics)
metadata = self._generate_metadata(model, X.attrs["is_test"])
self._save_json(results, filename, metadata)
self._log_results(results, filename)
This is the method you call in your workflow to use this evaluator. All of the arguments can be changed by overriding this method.
However, the flow of the evaluate method should be preserved. Specifically the
following steps should be called exactly as shown here to avoid errors at runtime.
metadata = self._generate_metadata(model, X.attrs["is_test"])
self._save_json(results, filename, metadata)
self._log_results(results, filename)
To integrate with the interactive report you need to implement the report method
in order to format the results dictionary returned by _calculate_measures into
a format suitable for the report table generation. This method should take the results
dictionary as an argument and return a tuple of two lists:
List of column headers
Nested list where each list is a row of the table
Here is an example of a report method for the example measure evaluator:
def report(self, results: Dict[str, Any]):
"""Report the evaluation results."""
columns = [key for key in results.keys() if key != "_metadata"]
row = []
for col in columns:
row.append(results[col])
return columns, [row]
Our complete custom evaluator looks like this:
from brisk.evaluation.evaluators import MeasureEvaluator
import pandas as pd
from typing import Dict, Any
class ExampleMeasureEvaluator(MeasureEvaluator):
def _calculate_measures(self, predictions, y_true, metrics) -> Dict[str, Any]:
"""Calculate prediction summary statistics."""
results = {}
for metric_name in metrics:
scorer = self.metric_config.get_metric(metric_name)
display_name = self.metric_config.get_name(metric_name)
metric_value = scorer(y_true, predictions)
results[display_name] = float(metric_value)
return results
def report(self, results: Dict[str, Any]):
"""Report the evaluation results."""
columns = [key for key in results.keys() if key != "_metadata"]
row = []
for col in columns:
row.append(results[col])
return columns, [row]
Custom Plot Evaluators#
As with the measure evaluators, you can create a custom plot evaluator by defining a new class
in evaluators.py and implementing the _generate_plot_data and _create_plot methods. _generate_plot_data
will return a dictionary of values that can be used to create the plot. _create_plot will take this dictionary
and implement the plot creation logic.
Note
Brisk supports several plotting libraries including plotnine, matplotlib, seaborn, and plotly.
The default parameters for _generate_plot_data are:
model: the trained model
X: the input data
y: the true target values
Here is an example of a custom plot evaluator:
from brisk.evaluation.evaluators import PlotEvaluator
import plotnine as pn
class PlotErrorHistogram(PlotEvaluator):
def _generate_plot_data(self, model, X: pd.DataFrame, y: pd.Series) -> pd.DataFrame:
"""Generate data for the error histogram plot."""
y_pred = self._generate_prediction(model, X)
errors = y - y_pred
return pd.DataFrame({
'errors': errors,
'abs_errors': abs(errors)
})
def _create_plot(self, plot_data: pd.DataFrame, display_name: str):
"""Create an error histogram plot."""
plot = (pn.ggplot(plot_data, pn.aes(x='errors')) +
pn.geom_histogram(bins=30, fill='skyblue', alpha=0.7) +
pn.labs(title=f'Prediction Error Distribution - {display_name}',
x='Prediction Error',
y='Frequency') +
self.theme)
return plot
For the _create_plot method adding self.theme can be used if creating plots
with plotnine. This will apply the same styling as the built-in plots. This is not
required and you are free to implement your own styling.
Note
If the _generate_plot_data method is not suitable for your evaluator you
can override the plot method. The default plot method is:
def plot(self, model, X, y, filename):
plot_data = self._generate_plot_data(model, X, y)
plot = self._create_plot(plot_data)
metadata = self._generate_metadata(model, X.attrs["is_test"])
self._save_plot(filename, metadata, plot=plot)
self._log_results(self.method_name, filename)
This is the method you call in your workflow to use this evaluator. All of the arguments can be changed by overriding this method.
However, the flow of the plot method should be preserved. Specifically the
following steps should be called exactly as shown here to avoid errors at runtime.
plot = self._create_plot(plot_data)
metadata = self._generate_metadata(model, X.attrs["is_test"])
self._save_plot(filename, metadata, plot=plot)
self._log_results(self.method_name, filename)
No other methods are needed to implement a custom plot evaluator.
Registering Custom Evaluators#
After defining your custom evaluator classes, you must register them with Brisk by adding a register_custom_evaluators() function to your evaluators.py file.
This can be done with the registry.register() method. You provide a name used to
access the evaluator and a description that will be displayed in the report.
from brisk.evaluation.evaluators.registry import EvaluatorRegistry
def register_custom_evaluators(registry: EvaluatorRegistry, theme) -> None:
"""Register custom evaluators with Brisk.
Parameters
----------
registry : EvaluatorRegistry
The evaluator registry to register with
theme : plotnine theme
The plotting theme for plot evaluators
"""
# Register custom measure evaluators (no theme needed)
registry.register(ExampleMeasureEvaluator(
"evaluate_prediction",
"Display evaluation results"
))
# Register custom plot evaluators (theme is required)
registry.register(PlotErrorHistogram(
"plot_error_histogram",
"Plot prediction error distribution",
theme
))
Important
For PlotEvaluators you must pass the theme to the constructor. This provides
information about how to save the images.
Calling Custom Evaluators in Workflows#
Once registered, you can call your custom evaluators in workflows using the .evaluate() or .plot() methods. You can do this in two ways:
Wrapper Methods (recommended for cleaner code):
By registering the evluators in the step above they will be included in the evaluation manager.
This allows you to access them using the self.evaluation_manager.get_evaluator() method.
# workflows/my_workflow.py
from brisk.training.workflow import Workflow
class MyWorkflow(Workflow):
def evaluate_prediction(self, model, X, y, filename):
"""Wrapper method for custom prediction summary evaluator."""
evaluator = self.evaluation_manager.get_evaluator("evaluate_prediction")
return evaluator.evaluate(model, X, y, ["MSE", "R2"], filename=filename)
def plot_error_histogram(self, model, X, y, display_name):
"""Wrapper method for custom error histogram plot."""
evaluator = self.evaluation_manager.get_evaluator("plot_error_histogram")
return evaluator.plot(model, X, y, display_name=display_name)
def workflow(self, X_train, X_test, y_train, y_test, output_dir, feature_names):
# Fit the model
self.model.fit(X_train, y_train)
# Use built-in methods
self.evaluate_model(
self.model, X_test, y_test,
["mean_absolute_error"], "model_score"
)
# Use custom wrapper methods
self.evaluate_prediction(self.model, X_test, y_test, "prediction_summary")
self.plot_error_histogram(self.model, X_test, y_test, "error_histogram")
Direct Calling:
You may also access the evaluators directly using the self.evaluation_manager.get_evaluator() method.
# workflows/my_workflow.py
from brisk.training.workflow import Workflow
class MyWorkflow(Workflow):
def workflow(self):
# Fit the model
self.model.fit(self.X_train, self.y_train)
# Direct calls to custom evaluators
custom_measure = self.evaluation_manager.get_evaluator("evaluate_prediction")
custom_measure.evaluate(self.model, X_test, y_test, ["MAE", "R2"], filename="prediction_summary")
custom_plot = self.evaluation_manager.get_evaluator("plot_error_histogram")
custom_plot.plot(self.model, X_test, y_test, filename="Error Analysis")
Note
Use descriptive names for evaluators (e.g., “plot_error_histogram” rather than “custom_plot”).
Your custom evaluators will appear alongside built-in evaluators in the final interactive report.