Split Expression and Metadata into Reference (R), Subset (X), and Inference (Y) Sets
Source:R/split_stratified_ancestry_sets.R
split_stratified_ancestry_sets.Rd
Performs stratified sampling of an overrepresented group (X, e.g. EUR) to match the distribution of an underrepresented group (Y, e.g. AFR) based on a grouping variable (e.g., condition).
Arguments
- X
Numeric matrix or data.frame of features for cohort X; rows are samples and must align with MX.
- Y
Numeric matrix or data.frame of features for cohort Y; rows are samples and must align with MY.
- MX
Data.frame with metadata for X.
- MY
Data.frame with metadata for Y.
- g_col
Name of the metadata column holding the stratification label.
- a_col
Name of the metadata column holding the ancestry label.
- seed
Optional numeric seed for reproducibility of sampling.
- verbose
Logical, whether to print messages.