Dataset Plots#

class Histogram(method_name: str, description: str, plot_settings)[source]#

Bases: DatasetPlotEvaluator

Plot histogram and boxplot visualizations for dataset features.

This evaluator creates side-by-side histogram plots comparing the distribution of features between training and test datasets. It uses Sturges” rule to determine optimal bin counts and applies consistent styling across all visualizations.

Attributes:
namestr

The name of the evaluator, set to “histogram”

generate_plot_data(train_data: Series, test_data: Series, feature_name: str) Dict[str, Any][source]#

Generate the plot data for histogram visualization.

Prepares the data needed for creating histogram plots by organizing training and test data along with feature metadata.

Parameters:
train_datapd.Series

The training data for the feature

test_datapd.Series

The test data for the feature

feature_namestr

The name of the feature being plotted

Returns:
Dict[str, Any]

Dictionary containing plot data with keys: - train_series: training data series - test_series: test data series - feature_name: name of the feature

plot(train_data: Series, test_data: Series, feature_name: str, filename: str, dataset_name: str, group_name: str) None[source]#

Plot a histogram and boxplot for a dataset.

Creates side-by-side histogram visualizations comparing the distribution of a feature between training and test datasets. The plot includes faceted histograms with consistent binning and styling.

Parameters:
train_datapd.Series

The training data for the feature to plot

test_datapd.Series

The test data for the feature to plot

feature_namestr

The name of the feature being plotted

filenamestr

The filename to save the plot to

dataset_namestr

The name of the dataset being analyzed

group_namestr

The name of the experiment group

Returns:
None

The plot is saved to the specified filename

class BarPlot(method_name: str, description: str, plot_settings)[source]#

Bases: DatasetPlotEvaluator

Plot bar chart visualizations for categorical dataset features.

This evaluator creates grouped bar charts comparing the proportions of categorical values between training and test datasets. It provides a clear visual comparison of class distributions across data splits.

Attributes:
namestr

The name of the evaluator, set to “bar_plot”

generate_plot_data(train_data: Series, test_data: Series, feature_name: str) Dict[str, Any][source]#

Generate the plot data for bar chart visualization.

Prepares the data needed for creating bar chart plots by calculating value counts and proportions for both training and test datasets.

Parameters:
train_datapd.Series

The training data for the categorical feature

test_datapd.Series

The test data for the categorical feature

feature_namestr

The name of the feature being plotted

Returns:
Dict[str, Any]

Dictionary containing plot data with keys: - train_value_counts: value counts for training data - test_value_counts: value counts for test data - feature_name: name of the feature

plot(train_data: Series, test_data: Series, feature_name: str, filename: str, dataset_name: str, group_name: str) None[source]#

Plot a bar chart for categorical feature proportions.

Creates a grouped bar chart comparing the proportions of categorical values between training and test datasets. The chart shows both absolute counts and normalized proportions.

Parameters:
train_datapd.Series

The training data for the categorical feature

test_datapd.Series

The test data for the categorical feature

feature_namestr

The name of the categorical feature being plotted

filenamestr

The filename to save the plot to

dataset_namestr

The name of the dataset being analyzed

group_namestr

The name of the experiment group

Returns:
None

The plot is saved to the specified filename

class CorrelationMatrix(method_name: str, description: str, plot_settings)[source]#

Bases: DatasetPlotEvaluator

Plot correlation matrix heatmaps for continuous features.

This evaluator creates correlation matrix heatmaps showing the relationships between continuous features in the dataset. The heatmap uses a color gradient to represent correlation strength and includes correlation values as text annotations.

Attributes:
namestr

The name of the evaluator, set to “correlation_matrix”

generate_plot_data(train_data: DataFrame, continuous_features: List[str]) Dict[str, Any][source]#

Generate the plot data for correlation matrix visualization.

Prepares the data needed for creating correlation matrix plots by calculating correlations and determining appropriate plot dimensions.

Parameters:
train_datapd.DataFrame

The training data containing continuous features

continuous_featuresList[str]

List of continuous feature names to include in the matrix

Returns:
Dict[str, Any]

Dictionary containing plot data with keys: - correlation_matrix: pandas correlation matrix - width: calculated plot width based on number of features - height: calculated plot height based on number of features

plot(train_data: DataFrame, continuous_features: List[str], filename: str, dataset_name: str, group_name: str) None[source]#

Plot a correlation matrix for continuous features.

Creates a correlation matrix heatmap showing the relationships between all continuous features in the dataset. The plot includes correlation values as text annotations and uses a color gradient to represent correlation strength.

Parameters:
train_datapd.DataFrame

The training data containing continuous features

continuous_featuresList[str]

List of continuous feature names to include in the correlation matrix

filenamestr

The filename to save the plot to

dataset_namestr

The name of the dataset being analyzed

group_namestr

The name of the experiment group

Returns:
None

The plot is saved to the specified filename