Common Plots#
- class PlotLearningCurve(method_name: str, description: str, plot_settings)[source]#
Bases:
PlotEvaluatorPlot learning curves showing model performance vs training size.
This evaluator creates learning curve plots that show how model performance changes as the training set size increases. Learning curves are useful for diagnosing bias vs variance problems and determining if more training data would improve performance.
- Parameters:
- method_namestr
The name of the evaluator (inherited from PlotEvaluator)
- descriptionstr
The description of the evaluator output (inherited from PlotEvaluator)
- plot_settingsPlotSettings
The plot settings containing theme and color configuration
- Attributes:
- method_namestr
The name of the evaluator
- descriptionstr
The description of the evaluator output
- themeAny
The plot theme for styling plots (inherited from PlotEvaluator)
- primary_colorstr
Primary color for plots (inherited from PlotEvaluator)
- secondary_colorstr
Secondary color for plots (inherited from PlotEvaluator)
- accent_colorstr
Accent color for plots (inherited from PlotEvaluator)
Notes
Learning curves show both training and validation scores across different training set sizes. The gap between training and validation scores can indicate overfitting (large gap) or underfitting (both scores are low).
The evaluator uses cross-validation to get robust estimates of performance at each training size, with error bars showing the standard deviation across folds.
Examples
- Use the learning curve evaluator:
>>> from brisk.evaluation.evaluators import registry >>> evaluator = registry.get("brisk_plot_learning_curve") >>> evaluator.plot(model, X, y, "learning_curve", cv=5)
- plot(model: BaseEstimator, X: DataFrame, y: Series, filename: str = 'learning_curve', cv: int = 5, num_repeats: int = 1, n_jobs: int = -1, metric: str = 'neg_mean_absolute_error') None[source]#
Plot learning curves showing model performance vs training size.
Executes the complete learning curve plotting workflow. This includes calculating learning curve data using cross-validation, creating the plot, and saving the results.
- Parameters:
- modelbase.BaseEstimator
Model to evaluate
- Xpd.DataFrame
Training features
- ypd.Series
Training target values
- filenamestr, optional
Name for output file, by default “learning_curve”
- cvint, optional
Number of cross-validation folds, by default 5
- num_repeatsint, optional
Number of times to repeat CV, by default 1
- n_jobsint, optional
Number of parallel jobs, by default -1
- metricstr, optional
Scoring metric to use, by default “neg_mean_absolute_error”
- Returns:
- None
Notes
The learning curve is calculated using scikit-learn’s learning_curve function with the specified cross-validation strategy. The plot shows both training and validation scores with error bars representing the standard deviation across folds.
The training sizes are automatically determined as fractions of the total dataset size, ranging from 10% to 100%.
- class PlotFeatureImportance(method_name: str, description: str, plot_settings)[source]#
Bases:
PlotEvaluatorPlot the feature importance for the model.
This evaluator creates feature importance plots using either built-in feature importance (for tree-based models) or permutation importance (for any model). Feature importance helps identify which features contribute most to the model’s predictions.
- Parameters:
- method_namestr
The name of the evaluator (inherited from PlotEvaluator)
- descriptionstr
The description of the evaluator output (inherited from PlotEvaluator)
- plot_settingsPlotSettings
The plot settings containing theme and color configuration
- Attributes:
- method_namestr
The name of the evaluator
- descriptionstr
The description of the evaluator output
- themeAny
The plot theme for styling plots (inherited from PlotEvaluator)
- primary_colorstr
Primary color for plots (inherited from PlotEvaluator)
- secondary_colorstr
Secondary color for plots (inherited from PlotEvaluator)
- accent_colorstr
Accent color for plots (inherited from PlotEvaluator)
Notes
For tree-based models (DecisionTree, RandomForest, GradientBoosting), the evaluator uses the built-in feature_importances_ attribute. For other models, it uses permutation importance, which measures the decrease in model performance when a feature is randomly shuffled.
The threshold parameter can be either an integer (number of top features) or a float (proportion of features to keep).
Examples
- Use the feature importance evaluator:
>>> from brisk.evaluation.evaluators import registry >>> evaluator = registry.get("brisk_plot_feature_importance") >>> evaluator.plot( ... model, X, y, threshold=10, feature_names=feature_names, ... filename="importance", metric="accuracy", num_rep=5 )
- plot(model: BaseEstimator, X: DataFrame, y: Series, threshold: int | float, feature_names: List[str], filename: str, metric: str, num_rep: int) None[source]#
Plot the feature importance for the model and save the plot.
Executes the complete feature importance plotting workflow. This includes calculating feature importance, filtering features, creating the plot, and saving the results.
- Parameters:
- modelbase.BaseEstimator
The model to evaluate
- Xpd.DataFrame
The input features
- ypd.Series
The target data
- thresholdUnion[int, float]
The number of features or the threshold to filter features by importance. If int, shows top N features. If float, shows top proportion of features.
- feature_namesList[str]
A list of feature names corresponding to the columns in X
- filenamestr
The name of the output file (without extension)
- metricstr
The metric to use for evaluation (used for permutation importance)
- num_repint
The number of repetitions for calculating importance
- Returns:
- None
Notes
The method automatically chooses between built-in feature importance (for tree-based models) and permutation importance (for other models) based on the model type.
The plot dimensions are automatically adjusted based on the number of features to ensure readability.
- class PlotModelComparison(method_name: str, description: str, plot_settings)[source]#
Bases:
PlotEvaluatorPlot a comparison of multiple models based on the specified measure.
This evaluator creates bar charts comparing the performance of multiple models on a single metric. This is useful for model selection and performance comparison across different algorithms.
- Parameters:
- method_namestr
The name of the evaluator (inherited from PlotEvaluator)
- descriptionstr
The description of the evaluator output (inherited from PlotEvaluator)
- plot_settingsPlotSettings
The plot settings containing theme and color configuration
- Attributes:
- method_namestr
The name of the evaluator
- descriptionstr
The description of the evaluator output
- themeAny
The plot theme for styling plots (inherited from PlotEvaluator)
- primary_colorstr
Primary color for plots (inherited from PlotEvaluator)
- secondary_colorstr
Secondary color for plots (inherited from PlotEvaluator)
- accent_colorstr
Accent color for plots (inherited from PlotEvaluator)
Notes
This evaluator is particularly useful for model selection and performance comparison. It evaluates all models on the same dataset using the same metric, ensuring fair comparison.
The plot shows model names on the x-axis and scores on the y-axis, with score values displayed on top of each bar for easy reading.
Examples
- Compare multiple models:
>>> from brisk.evaluation.evaluators import registry >>> evaluator = registry.get("brisk_plot_model_comparison") >>> evaluator.plot(model1, model2, model3, X=X, y=y, ... metric="accuracy", filename="comparison")
- plot(*models: BaseEstimator, X: DataFrame, y: Series, metric: str, filename: str) None[source]#
Plot a comparison of multiple models based on the specified metric.
Executes the complete model comparison plotting workflow. This includes evaluating each model on the specified metric, creating the comparison plot, and saving the results.
- Parameters:
- *modelsbase.BaseEstimator
A variable number of model instances to evaluate
- Xpd.DataFrame
The input features for evaluation
- ypd.Series
The target data for evaluation
- metricstr
The metric to evaluate and plot
- filenamestr
The name of the output file (without extension)
- Returns:
- None
Notes
The method evaluates each model individually on the same dataset using the same metric, ensuring fair comparison. If any model fails to evaluate, the plotting process is aborted.
Results are saved with metadata for later analysis and reporting.
- class PlotShapleyValues(method_name: str, description: str, plot_settings)[source]#
Bases:
PlotEvaluatorPlot SHAP (SHapley Additive exPlanations) values for feature importance.
This evaluator creates SHAP value plots for model interpretability. SHAP values provide a unified framework for explaining model predictions by quantifying the contribution of each feature to individual predictions.
- Parameters:
- method_namestr
The name of the evaluator (inherited from PlotEvaluator)
- descriptionstr
The description of the evaluator output (inherited from PlotEvaluator)
- plot_settingsPlotSettings
The plot settings containing theme and color configuration
- Attributes:
- method_namestr
The name of the evaluator
- descriptionstr
The description of the evaluator output
- themeAny
The plot theme for styling plots (inherited from PlotEvaluator)
- primary_colorstr
Primary color for plots (inherited from PlotEvaluator)
- secondary_colorstr
Secondary color for plots (inherited from PlotEvaluator)
- accent_colorstr
Accent color for plots (inherited from PlotEvaluator)
Notes
SHAP values provide model-agnostic explanations that are consistent and locally accurate. The evaluator supports multiple plot types: - bar: Mean absolute SHAP values across all samples - waterfall: Feature contributions for a single sample - violin: Distribution of SHAP values across samples - beeswarm: Scatter plot of SHAP values with jitter
The evaluator automatically chooses the appropriate SHAP explainer based on the model type (TreeExplainer, LinearExplainer, or KernelExplainer).
Examples
- Use the SHAP values evaluator:
>>> from brisk.evaluation.evaluators import registry >>> evaluator = registry.get("brisk_plot_shapley_values") >>> evaluator.plot(model, X, y, "shap_values", plot_type="bar")
- plot(model: BaseEstimator, X: DataFrame, y: Series, filename: str = 'shap_values', plot_type: str = 'bar') None[source]#
Generate SHAP value plots for feature importance.
Executes the complete SHAP plotting workflow. This includes generating SHAP values, creating the specified plot type(s), and saving the results.
- Parameters:
- modelbase.BaseEstimator
Trained model to explain
- Xpd.DataFrame
Feature data for generating explanations
- ypd.Series
Target data (not used for SHAP but required by interface)
- filenamestr, optional
Base output filename, by default “shap_values”
- plot_typestr, optional
Type of SHAP plot (“bar”, “waterfall”, “violin”, “beeswarm”). Multiple types can be specified as “bar,waterfall” to generate multiple plots, by default “bar”
- Returns:
- None
Notes
The method supports multiple plot types in a single call by specifying comma-separated plot types (e.g., “bar,waterfall”). Each plot type is saved as a separate file.
If SHAP is not installed, the method will log a warning and skip SHAP plot generation.