Dataset Plots#
- class Histogram(method_name: str, description: str, plot_settings)[source]#
Bases:
DatasetPlotEvaluatorPlot histogram and boxplot visualizations for dataset features.
This evaluator creates side-by-side histogram plots comparing the distribution of features between training and test datasets. It uses Sturges” rule to determine optimal bin counts and applies consistent styling across all visualizations.
- Attributes:
- namestr
The name of the evaluator, set to “histogram”
- plot(train_data: Series, test_data: Series, feature_name: str, filename: str, dataset_name: str, group_name: str) None[source]#
Plot a histogram and boxplot for a dataset.
Creates side-by-side histogram visualizations comparing the distribution of a feature between training and test datasets. The plot includes faceted histograms with consistent binning and styling.
- Parameters:
- train_datapd.Series
The training data for the feature to plot
- test_datapd.Series
The test data for the feature to plot
- feature_namestr
The name of the feature being plotted
- filenamestr
The filename to save the plot to
- dataset_namestr
The name of the dataset being analyzed
- group_namestr
The name of the experiment group
- Returns:
- None
The plot is saved to the specified filename
- class BarPlot(method_name: str, description: str, plot_settings)[source]#
Bases:
DatasetPlotEvaluatorPlot bar chart visualizations for categorical dataset features.
This evaluator creates grouped bar charts comparing the proportions of categorical values between training and test datasets. It provides a clear visual comparison of class distributions across data splits.
- Attributes:
- namestr
The name of the evaluator, set to “bar_plot”
- plot(train_data: Series, test_data: Series, feature_name: str, filename: str, dataset_name: str, group_name: str) None[source]#
Plot a bar chart for categorical feature proportions.
Creates a grouped bar chart comparing the proportions of categorical values between training and test datasets. The chart shows both absolute counts and normalized proportions.
- Parameters:
- train_datapd.Series
The training data for the categorical feature
- test_datapd.Series
The test data for the categorical feature
- feature_namestr
The name of the categorical feature being plotted
- filenamestr
The filename to save the plot to
- dataset_namestr
The name of the dataset being analyzed
- group_namestr
The name of the experiment group
- Returns:
- None
The plot is saved to the specified filename
- class CorrelationMatrix(method_name: str, description: str, plot_settings)[source]#
Bases:
DatasetPlotEvaluatorPlot correlation matrix heatmaps for continuous features.
This evaluator creates correlation matrix heatmaps showing the relationships between continuous features in the dataset. The heatmap uses a color gradient to represent correlation strength and includes correlation values as text annotations.
- Attributes:
- namestr
The name of the evaluator, set to “correlation_matrix”
- plot(train_data: DataFrame, continuous_features: List[str], filename: str, dataset_name: str, group_name: str) None[source]#
Plot a correlation matrix for continuous features.
Creates a correlation matrix heatmap showing the relationships between all continuous features in the dataset. The plot includes correlation values as text annotations and uses a color gradient to represent correlation strength.
- Parameters:
- train_datapd.DataFrame
The training data containing continuous features
- continuous_featuresList[str]
List of continuous feature names to include in the correlation matrix
- filenamestr
The filename to save the plot to
- dataset_namestr
The name of the dataset being analyzed
- group_namestr
The name of the experiment group
- Returns:
- None
The plot is saved to the specified filename