.. _custom_evaluators: Creating Custom Evaluators =========================== You can create custom plots and evaluation methods beyond the built-in evaluators by defining them in your project's ``evaluators.py`` file. Custom evaluators integrate with Brisk's evaluation system and appear in the interactive report. Built-in vs Custom Evaluators ------------------------------ **Built-in Evaluators** are provided by Brisk and include common plots (learning curves, feature importance, SHAP values) and evaluation methods (cross-validation, model comparison). **Custom Evaluators** are methods you add to analyze your models beyond Brisk's built-in evaluations. They allow you to create specialized visualizations, or add any custom analysis logic your project needs. Types of Custom Evaluators --------------------------- **Measure Evaluators** (``MeasureEvaluator``): Calculate numerical metrics and store results as JSON files. **Plot Evaluators** (``PlotEvaluator``): Generate visualizations and saves them as image files. Creating Custom Evaluators --------------------------- Custom evaluators are defined in your project's ``evaluators.py`` file. There are two types you can create: Custom Measure Evaluators ------------------------- To implement a custom measure evaluator define a new class in ``evaluators.py`` and implement the ``_calculate_measures`` method. The default arguments for this method are: - predictions: the model predictions - y_true: the true target values - metrics: the list of metric names to calculate The method should return a dictionary of the calculated values. .. note:: To access the metric scorer callable you can call ``self.metric_config.get_metric(metric_name)``. To access the metric display name call ``self.metric_config.get_name(metric_name)``. Here is an example of a custom measure evaluator: .. code-block:: python # evaluators.py from brisk.evaluation.evaluators import MeasureEvaluator import pandas as pd from typing import Dict, Any class ExampleMeasureEvaluator(MeasureEvaluator): def _calculate_measures(self, predictions, y_true, metrics) -> Dict[str, Any]: """Calculate prediction summary statistics.""" results = {} for metric_name in metrics: scorer = self.metric_config.get_metric(metric_name) display_name = self.metric_config.get_name(metric_name) metric_value = scorer(y_true, predictions) results[display_name] = float(metric_value) return results .. note:: If these arguments are not suitable for your evaluator you can override the ``evaluate`` method. The default evaluate method is: .. code-block:: python def evaluate(self, model, X, y, metrics, filename): predictions = self._generate_prediction(model, X) results = self._calculate_measures(predictions, y, metrics) metadata = self._generate_metadata(model, X.attrs["is_test"]) self._save_json(results, filename, metadata) self._log_results(results, filename) This is the method you call in your workflow to use this evaluator. All of the arguments can be changed by overriding this method. However, the flow of the ``evaluate`` method should be preserved. Specifically the following steps should be called exactly as shown here to avoid errors at runtime. .. code-block:: python metadata = self._generate_metadata(model, X.attrs["is_test"]) self._save_json(results, filename, metadata) self._log_results(results, filename) To integrate with the interactive report you need to implement the ``report`` method in order to format the results dictionary returned by ``_calculate_measures`` into a format suitable for the report table generation. This method should take the results dictionary as an argument and return a tuple of two lists: - List of column headers - Nested list where each list is a row of the table Here is an example of a report method for the example measure evaluator: .. code-block:: python def report(self, results: Dict[str, Any]): """Report the evaluation results.""" columns = [key for key in results.keys() if key != "_metadata"] row = [] for col in columns: row.append(results[col]) return columns, [row] Our complete custom evaluator looks like this: .. code-block:: python from brisk.evaluation.evaluators import MeasureEvaluator import pandas as pd from typing import Dict, Any class ExampleMeasureEvaluator(MeasureEvaluator): def _calculate_measures(self, predictions, y_true, metrics) -> Dict[str, Any]: """Calculate prediction summary statistics.""" results = {} for metric_name in metrics: scorer = self.metric_config.get_metric(metric_name) display_name = self.metric_config.get_name(metric_name) metric_value = scorer(y_true, predictions) results[display_name] = float(metric_value) return results def report(self, results: Dict[str, Any]): """Report the evaluation results.""" columns = [key for key in results.keys() if key != "_metadata"] row = [] for col in columns: row.append(results[col]) return columns, [row] Custom Plot Evaluators ---------------------- As with the measure evaluators, you can create a custom plot evaluator by defining a new class in ``evaluators.py`` and implementing the ``_generate_plot_data`` and ``_create_plot`` methods. ``_generate_plot_data`` will return a dictionary of values that can be used to create the plot. ``_create_plot`` will take this dictionary and implement the plot creation logic. .. note:: Brisk supports several plotting libraries including plotnine, matplotlib, seaborn, and plotly. The default parameters for ``_generate_plot_data`` are: - model: the trained model - X: the input data - y: the true target values Here is an example of a custom plot evaluator: .. code-block:: python from brisk.evaluation.evaluators import PlotEvaluator import plotnine as pn class PlotErrorHistogram(PlotEvaluator): def _generate_plot_data(self, model, X: pd.DataFrame, y: pd.Series) -> pd.DataFrame: """Generate data for the error histogram plot.""" y_pred = self._generate_prediction(model, X) errors = y - y_pred return pd.DataFrame({ 'errors': errors, 'abs_errors': abs(errors) }) def _create_plot(self, plot_data: pd.DataFrame, display_name: str): """Create an error histogram plot.""" plot = (pn.ggplot(plot_data, pn.aes(x='errors')) + pn.geom_histogram(bins=30, fill='skyblue', alpha=0.7) + pn.labs(title=f'Prediction Error Distribution - {display_name}', x='Prediction Error', y='Frequency') + self.theme) return plot For the ``_create_plot`` method adding ``self.theme`` can be used if creating plots with plotnine. This will apply the same styling as the built-in plots. This is not required and you are free to implement your own styling. .. note:: If the ``_generate_plot_data`` method is not suitable for your evaluator you can override the ``plot`` method. The default plot method is: .. code-block:: python def plot(self, model, X, y, filename): plot_data = self._generate_plot_data(model, X, y) plot = self._create_plot(plot_data) metadata = self._generate_metadata(model, X.attrs["is_test"]) self._save_plot(filename, metadata, plot=plot) self._log_results(self.method_name, filename) This is the method you call in your workflow to use this evaluator. All of the arguments can be changed by overriding this method. However, the flow of the ``plot`` method should be preserved. Specifically the following steps should be called exactly as shown here to avoid errors at runtime. .. code-block:: python plot = self._create_plot(plot_data) metadata = self._generate_metadata(model, X.attrs["is_test"]) self._save_plot(filename, metadata, plot=plot) self._log_results(self.method_name, filename) No other methods are needed to implement a custom plot evaluator. Registering Custom Evaluators ------------------------------ After defining your custom evaluator classes, you must register them with Brisk by adding a ``register_custom_evaluators()`` function to your ``evaluators.py`` file. This can be done with the ``registry.register()`` method. You provide a name used to access the evaluator and a description that will be displayed in the report. .. code-block:: python from brisk.evaluation.evaluators.registry import EvaluatorRegistry def register_custom_evaluators(registry: EvaluatorRegistry, theme) -> None: """Register custom evaluators with Brisk. Parameters ---------- registry : EvaluatorRegistry The evaluator registry to register with theme : plotnine theme The plotting theme for plot evaluators """ # Register custom measure evaluators (no theme needed) registry.register(ExampleMeasureEvaluator( "evaluate_prediction", "Display evaluation results" )) # Register custom plot evaluators (theme is required) registry.register(PlotErrorHistogram( "plot_error_histogram", "Plot prediction error distribution", theme )) .. important:: For PlotEvaluators you must pass the ``theme`` to the constructor. This provides information about how to save the images. Calling Custom Evaluators in Workflows --------------------------------------- Once registered, you can call your custom evaluators in workflows using the ``.evaluate()`` or ``.plot()`` methods. You can do this in two ways: **Wrapper Methods** (recommended for cleaner code): By registering the evluators in the step above they will be included in the evaluation manager. This allows you to access them using the ``self.evaluation_manager.get_evaluator()`` method. .. code-block:: python # workflows/my_workflow.py from brisk.training.workflow import Workflow class MyWorkflow(Workflow): def evaluate_prediction(self, model, X, y, filename): """Wrapper method for custom prediction summary evaluator.""" evaluator = self.evaluation_manager.get_evaluator("evaluate_prediction") return evaluator.evaluate(model, X, y, ["MSE", "R2"], filename=filename) def plot_error_histogram(self, model, X, y, display_name): """Wrapper method for custom error histogram plot.""" evaluator = self.evaluation_manager.get_evaluator("plot_error_histogram") return evaluator.plot(model, X, y, display_name=display_name) def workflow(self, X_train, X_test, y_train, y_test, output_dir, feature_names): # Fit the model self.model.fit(X_train, y_train) # Use built-in methods self.evaluate_model( self.model, X_test, y_test, ["mean_absolute_error"], "model_score" ) # Use custom wrapper methods self.evaluate_prediction(self.model, X_test, y_test, "prediction_summary") self.plot_error_histogram(self.model, X_test, y_test, "error_histogram") **Direct Calling**: You may also access the evaluators directly using the ``self.evaluation_manager.get_evaluator()`` method. .. code-block:: python # workflows/my_workflow.py from brisk.training.workflow import Workflow class MyWorkflow(Workflow): def workflow(self): # Fit the model self.model.fit(self.X_train, self.y_train) # Direct calls to custom evaluators custom_measure = self.evaluation_manager.get_evaluator("evaluate_prediction") custom_measure.evaluate(self.model, X_test, y_test, ["MAE", "R2"], filename="prediction_summary") custom_plot = self.evaluation_manager.get_evaluator("plot_error_histogram") custom_plot.plot(self.model, X_test, y_test, filename="Error Analysis") .. note:: Use descriptive names for evaluators (e.g., "\plot_error_histogram" rather than "custom_plot"). Your custom evaluators will appear alongside built-in evaluators in the final interactive report.