Report Data Models#

class ReportData(*, navbar: Navbar, datasets: Dict[str, ~brisk.reporting.report_data.Dataset]=<factory>, experiments: Dict[str, ~brisk.reporting.report_data.Experiment]=<factory>, experiment_groups: List[ExperimentGroup] = <factory>, data_managers: Dict[str, ~brisk.reporting.report_data.DataManager]=<factory>)[source]#

Bases: RoundedModel

Represents the entire machine learning report.

This is the root model that contains all data for a complete machine learning report, including navigation information, datasets, experiments, and data managers.

Attributes:
navbarNavbar

Navigation bar data with version and timestamp information

datasetsDict[str, Dataset]

Map of dataset IDs to Dataset instances

experimentsDict[str, Experiment]

Map of experiment IDs to Experiment instances

experiment_groupsList[ExperimentGroup]

List of experiment groups for organizing related experiments

data_managersDict[str, DataManager]

Map of data manager IDs to DataManager instances

Examples

>>> report = ReportData(
...     navbar=Navbar(brisk_version="1.0.0", timestamp="2024-01-15"),
...     datasets={"dataset_1": Dataset(...)},
...     experiments={"exp_1": Experiment(...)},
...     experiment_groups=[ExperimentGroup(...)],
...     data_managers={"dm_1": DataManager(...)}
... )
data_managers: Dict[str, DataManager]#
datasets: Dict[str, Dataset]#
experiment_groups: List[ExperimentGroup]#
experiments: Dict[str, Experiment]#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

navbar: Navbar#
class RoundedModel[source]#

Bases: BaseModel

Base Pydantic model that enforces rounding of all numbers.

This model automatically rounds all numerical values to 3 decimal places before validation. It uses the _deep_round function to handle nested data structures and special string formats.

Attributes:
All attributes are automatically rounded to 3 decimal places

Notes

This class should be used as a base class for all models that need consistent numerical rounding for display purposes.

Examples

>>> class MyModel(RoundedModel):
...     value: float
...     scores: List[float]
>>> model = MyModel(value=1.234567, scores=[0.1, 0.234567])
>>> model.value
1.235
>>> model.scores
[0.1, 0.235]
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class TableData(*, name: str, description: str | None = None, columns: List[str], rows: List[List[str]])[source]#

Bases: RoundedModel

Represents tabular data with columns and rows.

This model is used to structure tabular data for display in reports. It includes metadata like name and description along with the actual table structure.

Attributes:
namestr

The name/title of the table

descriptionOptional[str]

Optional description text displayed below the table

columnsList[str]

List of column headers

rowsList[List[str]]

List of rows, each row is a list of cell values

Examples

>>> table = TableData(
...     name="Model Performance",
...     description="Cross-validation results",
...     columns=["Algorithm", "Accuracy", "Precision"],
...     rows=[["Random Forest", "0.95", "0.92"], ["SVM", "0.93", "0.89"]]
... )
columns: List[str]#
description: str | None#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str#
rows: List[List[str]]#
class PlotData(*, name: str, description: str, image: str)[source]#

Bases: RoundedModel

Structure for all plots in the report.

This model represents plot data including metadata and the actual plot content (typically as SVG or base64 encoded image data).

Attributes:
namestr

The name/title of the plot

descriptionstr

Description of what the plot shows

imagestr

The plot content, typically as SVG string or base64 encoded image

Examples

>>> plot = PlotData(
...     name="Feature Importance",
...     description="Shows the importance of each feature",
...     image="<svg>...</svg>"
... )
description: str#
image: str#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str#
class FeatureDistribution(*, ID: str, tables: List[TableData], plot: PlotData)[source]#

Bases: RoundedModel

Distribution of a feature across train and test splits.

This model represents the distribution analysis of a single feature across different data splits, including both tabular statistics and visual plots.

Attributes:
IDstr

Unique identifier for the feature

tablesList[TableData]

List of tables containing distribution statistics

plotPlotData

Plot showing the feature distribution

Examples

>>> feature_dist = FeatureDistribution(
...     ID="feature_1",
...     tables=[TableData(...)],
...     plot=PlotData(...)
... )
ID: str#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

plot: PlotData#
tables: List[TableData]#
class DataManager(*, ID: str, test_size: float, n_splits: int, split_method: str, group_column: str, stratified: str, random_state: int | None)[source]#

Bases: RoundedModel

Represents a DataManager instance configuration.

This model stores the configuration parameters used for data splitting and management in machine learning experiments.

Attributes:
IDstr

Unique identifier for the data manager

test_sizefloat

Proportion of data to use for testing (0.0 to 1.0)

n_splitsint

Number of cross-validation splits

split_methodstr

Method used for splitting data (e.g., ‘random’, ‘stratified’)

group_columnstr

Column name used for group-based splitting

stratifiedstr

Whether stratification is used (‘True’ or ‘False’)

random_stateint | None

Random seed for reproducible splits, None if not set

Examples

>>> data_mgr = DataManager(
...     ID="dm_1",
...     test_size=0.2,
...     n_splits=5,
...     split_method="stratified",
...     group_column="group_id",
...     stratified="True",
...     random_state=42
... )
ID: str#
group_column: str#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_splits: int#
random_state: int | None#
split_method: str#
stratified: str#
test_size: float#
class Navbar(*, brisk_version: str, timestamp: str)[source]#

Bases: RoundedModel

Data for the navigation bar.

This model contains metadata displayed in the report’s navigation bar, typically including version information and timestamps.

Attributes:
brisk_versionstr

Version of the Brisk library used to generate the report

timestampstr

Timestamp when the report was generated

Examples

>>> navbar = Navbar(
...     brisk_version="1.0.0",
...     timestamp="2024-01-15 10:30:00"
... )
brisk_version: str#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

timestamp: str#
class ExperimentGroup(*, name: str, description: str, datasets: List[str] = <factory>, experiments: List[str] = <factory>, data_split_scores: Dict[str, ~typing.List[~typing.Tuple[str, str | None, str, str | None]]]=<factory>, test_scores: Dict[str, ~brisk.reporting.report_data.TableData]=<factory>)[source]#

Bases: RoundedModel

Data for an ExperimentGroup card on the home page.

This model represents a group of related experiments that are displayed together on the report’s home page. It includes metadata about the group and references to datasets and experiments within the group.

Attributes:
namestr

Name of the experiment group

descriptionstr

Description of what the experiment group contains

datasetsList[str]

List of dataset IDs included in this group

experimentsList[str]

List of experiment IDs included in this group

data_split_scoresDict[str, List[Tuple[str, str | None, str, str | None]]]

Best algorithm and score for each data split, keyed by dataset name

test_scoresDict[str, TableData]

Test data scores indexed on dataset name and split number

Examples

>>> group = ExperimentGroup(
...     name="Classification Experiments",
...     description="Binary classification on various datasets",
...     datasets=["dataset_1", "dataset_2"],
...     experiments=["exp_1", "exp_2"],
...     data_split_scores={"dataset_1": [("XTree", "0.95", "0.92", None)]},
...     test_scores={"dataset_1": TableData(...)}
... )
data_split_scores: Dict[str, List[Tuple[str, str | None, str, str | None]]]#
datasets: List[str]#
description: str#
experiments: List[str]#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str#
test_scores: Dict[str, TableData]#
class Experiment(*, ID: str, dataset: str, algorithm: List[str] = <factory>, tuned_params: Dict[str, ~typing.Any]=<factory>, hyperparam_grid: Dict[str, ~typing.Any]=<factory>, tables: List[TableData] = <factory>, plots: List[PlotData] = <factory>)[source]#

Bases: RoundedModel

Results of a single machine learning experiment.

This model represents the complete results of a single experiment, including algorithm information, hyperparameters, and all associated tables and plots.

Attributes:
IDstr

Unique identifier for the experiment

datasetstr

Name of the dataset used in this experiment

algorithmList[str]

Display names of algorithms used in the experiment

tuned_paramsDict[str, Any]

Tuned hyperparameter names and values

hyperparam_gridDict[str, Any]

Hyperparameter grid used for tuning

tablesList[TableData]

List of tables containing experiment results

plotsList[PlotData]

List of plots visualizing experiment results

Examples

>>> experiment = Experiment(
...     ID="exp_1",
...     dataset="iris",
...     algorithm=["Random Forest", "SVM"],
...     tuned_params={"n_estimators": 100, "max_depth": 10},
...     hyperparam_grid={"n_estimators": [50, 100, 200]},
...     tables=[TableData(...)],
...     plots=[PlotData(...)]
... )
ID: str#
algorithm: List[str]#
dataset: str#
hyperparam_grid: Dict[str, Any]#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

plots: List[PlotData]#
tables: List[TableData]#
tuned_params: Dict[str, Any]#
class Dataset(*, ID: str, splits: List[str] = <factory>, split_sizes: Dict[str, ~typing.Dict[str, int]]=<factory>, split_target_stats: Dict[str, ~typing.Dict[str, float | dict]]=<factory>, split_corr_matrices: Dict[str, ~brisk.reporting.report_data.PlotData]=<factory>, data_manager_id: str, features: List[str] = <factory>, split_feature_distributions: Dict[str, ~typing.List[~brisk.reporting.report_data.FeatureDistribution]]=<factory>)[source]#

Bases: RoundedModel

Represents a dataset within an ExperimentGroup.

This model contains comprehensive information about a dataset including its splits, feature information, and various statistical analyses.

Attributes:
IDstr

Unique identifier for the dataset

splitsList[str]

List of data split indexes (e.g., [“0”, “1”, “2”])

split_sizesDict[str, Dict[str, int]]

Size of dataset and train/test split for each split

split_target_statsDict[str, Dict[str, Union[float, dict]]]

Target feature statistics per split

split_corr_matricesDict[str, PlotData]

Correlation matrix plots per split

data_manager_idstr

ID of the associated DataManager

featuresList[str]

List of feature names in the dataset

split_feature_distributionsDict[str, List[FeatureDistribution]]

Feature distribution analyses per split

Examples

>>> dataset = Dataset(
...     ID="dataset_1",
...     splits=["0", "1", "2"],
...     split_sizes={"0": {"total": 1000, "train": 800, "test": 200}},
...     split_target_stats={"0": {"mean": 0.5, "std": 0.1}},
...     split_corr_matrices={"0": PlotData(...)},
...     data_manager_id="dm_1",
...     features=["feature_1", "feature_2"],
...     split_feature_distributions={"0": [FeatureDistribution(...)]}
... )
ID: str#
data_manager_id: str#
features: List[str]#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

split_corr_matrices: Dict[str, PlotData]#
split_feature_distributions: Dict[str, List[FeatureDistribution]]#
split_sizes: Dict[str, Dict[str, int]]#
split_target_stats: Dict[str, Dict[str, float | dict]]#
splits: List[str]#