Dataset Plots#

class Histogram(method_name: str, description: str, plot_settings)[source]#

Bases: DatasetPlotEvaluator

Plot histogram and boxplot visualizations for dataset features.

This evaluator creates side-by-side histogram plots comparing the distribution of features between training and test datasets. It uses Sturges” rule to determine optimal bin counts and applies consistent styling across all visualizations.

Attributes:
namestr

The name of the evaluator, set to “histogram”

plot(train_data: Series, test_data: Series, feature_name: str, filename: str, dataset_name: str, group_name: str) None[source]#

Plot a histogram and boxplot for a dataset.

Creates side-by-side histogram visualizations comparing the distribution of a feature between training and test datasets. The plot includes faceted histograms with consistent binning and styling.

Parameters:
train_datapd.Series

The training data for the feature to plot

test_datapd.Series

The test data for the feature to plot

feature_namestr

The name of the feature being plotted

filenamestr

The filename to save the plot to

dataset_namestr

The name of the dataset being analyzed

group_namestr

The name of the experiment group

Returns:
None

The plot is saved to the specified filename

class BarPlot(method_name: str, description: str, plot_settings)[source]#

Bases: DatasetPlotEvaluator

Plot bar chart visualizations for categorical dataset features.

This evaluator creates grouped bar charts comparing the proportions of categorical values between training and test datasets. It provides a clear visual comparison of class distributions across data splits.

Attributes:
namestr

The name of the evaluator, set to “bar_plot”

plot(train_data: Series, test_data: Series, feature_name: str, filename: str, dataset_name: str, group_name: str) None[source]#

Plot a bar chart for categorical feature proportions.

Creates a grouped bar chart comparing the proportions of categorical values between training and test datasets. The chart shows both absolute counts and normalized proportions.

Parameters:
train_datapd.Series

The training data for the categorical feature

test_datapd.Series

The test data for the categorical feature

feature_namestr

The name of the categorical feature being plotted

filenamestr

The filename to save the plot to

dataset_namestr

The name of the dataset being analyzed

group_namestr

The name of the experiment group

Returns:
None

The plot is saved to the specified filename

class CorrelationMatrix(method_name: str, description: str, plot_settings)[source]#

Bases: DatasetPlotEvaluator

Plot correlation matrix heatmaps for continuous features.

This evaluator creates correlation matrix heatmaps showing the relationships between continuous features in the dataset. The heatmap uses a color gradient to represent correlation strength and includes correlation values as text annotations.

Attributes:
namestr

The name of the evaluator, set to “correlation_matrix”

plot(train_data: DataFrame, continuous_features: List[str], filename: str, dataset_name: str, group_name: str) None[source]#

Plot a correlation matrix for continuous features.

Creates a correlation matrix heatmap showing the relationships between all continuous features in the dataset. The plot includes correlation values as text annotations and uses a color gradient to represent correlation strength.

Parameters:
train_datapd.DataFrame

The training data containing continuous features

continuous_featuresList[str]

List of continuous feature names to include in the correlation matrix

filenamestr

The filename to save the plot to

dataset_namestr

The name of the dataset being analyzed

group_namestr

The name of the experiment group

Returns:
None

The plot is saved to the specified filename