Skip to contents

Creates a bar plot showing the number of samples in each dataset split (train, test, inference), grouped and colored by stratification variables. Each stratification column is shown in a separate facet for clear comparison.

Usage

plot_stratified_sets(MX, MY, MR, g_col, title = NULL, point_size = 0.5)

Arguments

MX

A data.frame containing metadata for the test set (usually output from `split_stratified_ancestry_sets()`).

MY

A data.frame containing metadata for the inference set.

MR

A data.frame containing metadata for the train set.

g_col

Character vector of column names in the metadata to stratify and plot by (e.g., c("ancestry", "sex")).

title

Optional character string to use as the plot title.

point_size

Numeric value controlling point/label size (currently not used in plotting directly).

Value

A ggplot2 object showing counts per stratum and dataset split, faceted by stratification variable.

Details

This is useful for visually checking balance or representation across strata (e.g., ancestry or condition) within each split.