Dataset Plots#
- class Histogram(method_name: str, description: str, plot_settings)[source]#
Bases:
DatasetPlotEvaluatorPlot histogram and boxplot visualizations for dataset features.
This evaluator creates side-by-side histogram plots comparing the distribution of features between training and test datasets. It uses Sturges” rule to determine optimal bin counts and applies consistent styling across all visualizations.
- Attributes:
- namestr
The name of the evaluator, set to “histogram”
- generate_plot_data(train_data: Series, test_data: Series, feature_name: str) Dict[str, Any][source]#
Generate the plot data for histogram visualization.
Prepares the data needed for creating histogram plots by organizing training and test data along with feature metadata.
- Parameters:
- train_datapd.Series
The training data for the feature
- test_datapd.Series
The test data for the feature
- feature_namestr
The name of the feature being plotted
- Returns:
- Dict[str, Any]
Dictionary containing plot data with keys: - train_series: training data series - test_series: test data series - feature_name: name of the feature
- plot(train_data: Series, test_data: Series, feature_name: str, filename: str, dataset_name: str, group_name: str) None[source]#
Plot a histogram and boxplot for a dataset.
Creates side-by-side histogram visualizations comparing the distribution of a feature between training and test datasets. The plot includes faceted histograms with consistent binning and styling.
- Parameters:
- train_datapd.Series
The training data for the feature to plot
- test_datapd.Series
The test data for the feature to plot
- feature_namestr
The name of the feature being plotted
- filenamestr
The filename to save the plot to
- dataset_namestr
The name of the dataset being analyzed
- group_namestr
The name of the experiment group
- Returns:
- None
The plot is saved to the specified filename
- class BarPlot(method_name: str, description: str, plot_settings)[source]#
Bases:
DatasetPlotEvaluatorPlot bar chart visualizations for categorical dataset features.
This evaluator creates grouped bar charts comparing the proportions of categorical values between training and test datasets. It provides a clear visual comparison of class distributions across data splits.
- Attributes:
- namestr
The name of the evaluator, set to “bar_plot”
- generate_plot_data(train_data: Series, test_data: Series, feature_name: str) Dict[str, Any][source]#
Generate the plot data for bar chart visualization.
Prepares the data needed for creating bar chart plots by calculating value counts and proportions for both training and test datasets.
- Parameters:
- train_datapd.Series
The training data for the categorical feature
- test_datapd.Series
The test data for the categorical feature
- feature_namestr
The name of the feature being plotted
- Returns:
- Dict[str, Any]
Dictionary containing plot data with keys: - train_value_counts: value counts for training data - test_value_counts: value counts for test data - feature_name: name of the feature
- plot(train_data: Series, test_data: Series, feature_name: str, filename: str, dataset_name: str, group_name: str) None[source]#
Plot a bar chart for categorical feature proportions.
Creates a grouped bar chart comparing the proportions of categorical values between training and test datasets. The chart shows both absolute counts and normalized proportions.
- Parameters:
- train_datapd.Series
The training data for the categorical feature
- test_datapd.Series
The test data for the categorical feature
- feature_namestr
The name of the categorical feature being plotted
- filenamestr
The filename to save the plot to
- dataset_namestr
The name of the dataset being analyzed
- group_namestr
The name of the experiment group
- Returns:
- None
The plot is saved to the specified filename
- class CorrelationMatrix(method_name: str, description: str, plot_settings)[source]#
Bases:
DatasetPlotEvaluatorPlot correlation matrix heatmaps for continuous features.
This evaluator creates correlation matrix heatmaps showing the relationships between continuous features in the dataset. The heatmap uses a color gradient to represent correlation strength and includes correlation values as text annotations.
- Attributes:
- namestr
The name of the evaluator, set to “correlation_matrix”
- generate_plot_data(train_data: DataFrame, continuous_features: List[str]) Dict[str, Any][source]#
Generate the plot data for correlation matrix visualization.
Prepares the data needed for creating correlation matrix plots by calculating correlations and determining appropriate plot dimensions.
- Parameters:
- train_datapd.DataFrame
The training data containing continuous features
- continuous_featuresList[str]
List of continuous feature names to include in the matrix
- Returns:
- Dict[str, Any]
Dictionary containing plot data with keys: - correlation_matrix: pandas correlation matrix - width: calculated plot width based on number of features - height: calculated plot height based on number of features
- plot(train_data: DataFrame, continuous_features: List[str], filename: str, dataset_name: str, group_name: str) None[source]#
Plot a correlation matrix for continuous features.
Creates a correlation matrix heatmap showing the relationships between all continuous features in the dataset. The plot includes correlation values as text annotations and uses a color gradient to represent correlation strength.
- Parameters:
- train_datapd.DataFrame
The training data containing continuous features
- continuous_featuresList[str]
List of continuous feature names to include in the correlation matrix
- filenamestr
The filename to save the plot to
- dataset_namestr
The name of the dataset being analyzed
- group_namestr
The name of the experiment group
- Returns:
- None
The plot is saved to the specified filename