Plot Sample Counts by Stratified Variable and Dataset Split — plot_stratified

Creates a bar plot showing the number of samples in each dataset split (train, test, inference), grouped and colored by stratification variables. Each stratification column is shown in a separate facet for clear comparison.

Usage

plot_stratified_sets(MX, MY, MR, g_col, title = NULL, point_size = 0.5)

Arguments

MX: A data.frame containing metadata for the test set (usually output from `split_stratified_ancestry_sets()`).
MY: A data.frame containing metadata for the inference set.
MR: A data.frame containing metadata for the train set.
g_col: Character vector of column names in the metadata to stratify and plot by (e.g., c("ancestry", "sex")).
title: Optional character string to use as the plot title.
point_size: Numeric value controlling point/label size (currently not used in plotting directly).

Value

A ggplot2 object showing counts per stratum and dataset split, faceted by stratification variable.

Details

This is useful for visually checking balance or representation across strata (e.g., ancestry or condition) within each split.