Creating Custom Evaluators#

You can create custom plots and evaluation methods beyond the built-in evaluators by defining them in your project’s evaluators.py file. Custom evaluators integrate with Brisk’s evaluation system and appear in the interactive report.

Built-in vs Custom Evaluators#

Built-in Evaluators are provided by Brisk and include common plots (learning curves, feature importance, SHAP values) and evaluation methods (cross-validation, model comparison).

Custom Evaluators are methods you add to analyze your models beyond Brisk’s built-in evaluations. They allow you to create specialized visualizations, or add any custom analysis logic your project needs.

Types of Custom Evaluators#

Measure Evaluators (MeasureEvaluator): Calculate numerical metrics and store results as JSON files.

Plot Evaluators (PlotEvaluator): Generate visualizations and saves them as image files.

Creating Custom Evaluators#

Custom evaluators are defined in your project’s evaluators.py file. There are two types you can create:

Custom Measure Evaluators#

To implement a custom measure evaluator define a new class in evaluators.py and implement the _calculate_measures method. The default arguments for this method are:

  • predictions: the model predictions

  • y_true: the true target values

  • metrics: the list of metric names to calculate

The method should return a dictionary of the calculated values.

Note

To access the metric scorer callable you can call self.metric_config.get_metric(metric_name).

To access the metric display name call self.metric_config.get_name(metric_name).

Here is an example of a custom measure evaluator:

# evaluators.py
from brisk.evaluation.evaluators import MeasureEvaluator
import pandas as pd
from typing import Dict, Any

class ExampleMeasureEvaluator(MeasureEvaluator):
    def _calculate_measures(self, predictions, y_true, metrics) -> Dict[str, Any]:
        """Calculate prediction summary statistics."""
         results = {}
         for metric_name in metrics:
             scorer = self.metric_config.get_metric(metric_name)
             display_name = self.metric_config.get_name(metric_name)
             metric_value = scorer(y_true, predictions)
             results[display_name] = float(metric_value)
         return results

Note

If these arguments are not suitable for your evaluator you can override the evaluate method. The default evaluate method is:

def evaluate(self, model, X, y, metrics, filename):
     predictions = self._generate_prediction(model, X)
     results = self._calculate_measures(predictions, y, metrics)
     metadata = self._generate_metadata(model, X.attrs["is_test"])
     self._save_json(results, filename, metadata)
     self._log_results(results, filename)

This is the method you call in your workflow to use this evaluator. All of the arguments can be changed by overriding this method.

However, the flow of the evaluate method should be preserved. Specifically the following steps should be called exactly as shown here to avoid errors at runtime.

metadata = self._generate_metadata(model, X.attrs["is_test"])
self._save_json(results, filename, metadata)
self._log_results(results, filename)

To integrate with the interactive report you need to implement the report method in order to format the results dictionary returned by _calculate_measures into a format suitable for the report table generation. This method should take the results dictionary as an argument and return a tuple of two lists:

  • List of column headers

  • Nested list where each list is a row of the table

Here is an example of a report method for the example measure evaluator:

def report(self, results: Dict[str, Any]):
    """Report the evaluation results."""
    columns = [key for key in results.keys() if key != "_metadata"]
     row = []
     for col in columns:
         row.append(results[col])
     return columns, [row]

Our complete custom evaluator looks like this:

from brisk.evaluation.evaluators import MeasureEvaluator
import pandas as pd
from typing import Dict, Any

class ExampleMeasureEvaluator(MeasureEvaluator):
    def _calculate_measures(self, predictions, y_true, metrics) -> Dict[str, Any]:
        """Calculate prediction summary statistics."""
         results = {}
         for metric_name in metrics:
             scorer = self.metric_config.get_metric(metric_name)
             display_name = self.metric_config.get_name(metric_name)
             metric_value = scorer(y_true, predictions)
             results[display_name] = float(metric_value)
         return results

 def report(self, results: Dict[str, Any]):
     """Report the evaluation results."""
     columns = [key for key in results.keys() if key != "_metadata"]
         row = []
         for col in columns:
             row.append(results[col])
         return columns, [row]

Custom Plot Evaluators#

As with the measure evaluators, you can create a custom plot evaluator by defining a new class in evaluators.py and implementing the _generate_plot_data and _create_plot methods. _generate_plot_data will return a dictionary of values that can be used to create the plot. _create_plot will take this dictionary and implement the plot creation logic.

Note

Brisk supports several plotting libraries including plotnine, matplotlib, seaborn, and plotly.

The default parameters for _generate_plot_data are:

  • model: the trained model

  • X: the input data

  • y: the true target values

Here is an example of a custom plot evaluator:

from brisk.evaluation.evaluators import PlotEvaluator
import plotnine as pn

class PlotErrorHistogram(PlotEvaluator):
    def _generate_plot_data(self, model, X: pd.DataFrame, y: pd.Series) -> pd.DataFrame:
        """Generate data for the error histogram plot."""
        y_pred = self._generate_prediction(model, X)
        errors = y - y_pred

        return pd.DataFrame({
            'errors': errors,
            'abs_errors': abs(errors)
        })

    def _create_plot(self, plot_data: pd.DataFrame, display_name: str):
        """Create an error histogram plot."""
        plot = (pn.ggplot(plot_data, pn.aes(x='errors')) +
                pn.geom_histogram(bins=30, fill='skyblue', alpha=0.7) +
                pn.labs(title=f'Prediction Error Distribution - {display_name}',
                       x='Prediction Error',
                       y='Frequency') +
                self.theme)
        return plot

For the _create_plot method adding self.theme can be used if creating plots with plotnine. This will apply the same styling as the built-in plots. This is not required and you are free to implement your own styling.

Note

If the _generate_plot_data method is not suitable for your evaluator you can override the plot method. The default plot method is:

def plot(self, model, X, y, filename):
 plot_data = self._generate_plot_data(model, X, y)
 plot = self._create_plot(plot_data)
 metadata = self._generate_metadata(model, X.attrs["is_test"])
 self._save_plot(filename, metadata, plot=plot)
 self._log_results(self.method_name, filename)

This is the method you call in your workflow to use this evaluator. All of the arguments can be changed by overriding this method.

However, the flow of the plot method should be preserved. Specifically the following steps should be called exactly as shown here to avoid errors at runtime.

plot = self._create_plot(plot_data)
metadata = self._generate_metadata(model, X.attrs["is_test"])
self._save_plot(filename, metadata, plot=plot)
self._log_results(self.method_name, filename)

No other methods are needed to implement a custom plot evaluator.

Registering Custom Evaluators#

After defining your custom evaluator classes, you must register them with Brisk by adding a register_custom_evaluators() function to your evaluators.py file. This can be done with the registry.register() method. You provide a name used to access the evaluator and a description that will be displayed in the report.

from brisk.evaluation.evaluators.registry import EvaluatorRegistry

def register_custom_evaluators(registry: EvaluatorRegistry, theme) -> None:
    """Register custom evaluators with Brisk.

    Parameters
    ----------
    registry : EvaluatorRegistry
        The evaluator registry to register with
    theme : plotnine theme
        The plotting theme for plot evaluators
    """
    # Register custom measure evaluators (no theme needed)
    registry.register(ExampleMeasureEvaluator(
        "evaluate_prediction",
        "Display evaluation results"
    ))

    # Register custom plot evaluators (theme is required)
    registry.register(PlotErrorHistogram(
        "plot_error_histogram",
        "Plot prediction error distribution",
        theme
    ))

Important

For PlotEvaluators you must pass the theme to the constructor. This provides information about how to save the images.

Calling Custom Evaluators in Workflows#

Once registered, you can call your custom evaluators in workflows using the .evaluate() or .plot() methods. You can do this in two ways:

Wrapper Methods (recommended for cleaner code):

By registering the evluators in the step above they will be included in the evaluation manager. This allows you to access them using the self.evaluation_manager.get_evaluator() method.

# workflows/my_workflow.py
from brisk.training.workflow import Workflow

class MyWorkflow(Workflow):
    def evaluate_prediction(self, model, X, y, filename):
        """Wrapper method for custom prediction summary evaluator."""
        evaluator = self.evaluation_manager.get_evaluator("evaluate_prediction")
        return evaluator.evaluate(model, X, y, ["MSE", "R2"], filename=filename)

    def plot_error_histogram(self, model, X, y, display_name):
        """Wrapper method for custom error histogram plot."""
        evaluator = self.evaluation_manager.get_evaluator("plot_error_histogram")
        return evaluator.plot(model, X, y, display_name=display_name)

    def workflow(self, X_train, X_test, y_train, y_test, output_dir, feature_names):
        # Fit the model
        self.model.fit(X_train, y_train)

        # Use built-in methods
        self.evaluate_model(
            self.model, X_test, y_test,
            ["mean_absolute_error"], "model_score"
        )

        # Use custom wrapper methods
        self.evaluate_prediction(self.model, X_test, y_test, "prediction_summary")
        self.plot_error_histogram(self.model, X_test, y_test, "error_histogram")

Direct Calling:

You may also access the evaluators directly using the self.evaluation_manager.get_evaluator() method.

# workflows/my_workflow.py
from brisk.training.workflow import Workflow

class MyWorkflow(Workflow):
    def workflow(self):
        # Fit the model
        self.model.fit(self.X_train, self.y_train)

        # Direct calls to custom evaluators
        custom_measure = self.evaluation_manager.get_evaluator("evaluate_prediction")
        custom_measure.evaluate(self.model, X_test, y_test, ["MAE", "R2"], filename="prediction_summary")

        custom_plot = self.evaluation_manager.get_evaluator("plot_error_histogram")
        custom_plot.plot(self.model, X_test, y_test, filename="Error Analysis")

Note

Use descriptive names for evaluators (e.g., “plot_error_histogram” rather than “custom_plot”).

Your custom evaluators will appear alongside built-in evaluators in the final interactive report.