Simulate RNA-seq Expression Data for Four Groups (Two Ancestries × Two Conditions)
Source:R/sim_4group_expression.R
sim_4group_expression.Rd
Generate simulated RNA-seq count matrices for two ancestries, each with two
conditions, based on parameters estimated from real data. This function calls
sim_2group_expression
twice (once per ancestry) and then calculates
the true interaction effects between ancestries for each gene.
Arguments
- estimates_X
Optional list of parameter estimates from
estimate_params
for ancestry X. IfNULL
, parameters are estimated fromX
.- estimates_Y
Optional list of parameter estimates from
estimate_params
for ancestry Y. IfNULL
, parameters are estimated fromY
.- ancestry_X
Character scalar giving the ancestry label for
X
.- ancestry_Y
Character scalar giving the ancestry label for
Y
.- n_samples_X
Integer, number of samples to simulate per condition for ancestry X.
- n_samples_Y
Integer, number of samples to simulate per condition for ancestry Y.
- n_degs_X
Integer, number of differentially expressed genes to simulate in ancestry X.
- n_degs_Y
Integer, number of differentially expressed genes to simulate in ancestry Y.
- log2FC_X
Numeric, log2 fold-change magnitude for DEGs in ancestry X.
- log2FC_Y
Numeric, log2 fold-change magnitude for DEGs in ancestry Y.
- mean_method
Character string, method to use for mean estimates in both ancestries. One of
"mle"
,"map"
,"libnorm_mle"
,"libnorm_map"
.- disp_method
Character string, method to use for dispersion estimates in both ancestries. One of
"mle"
,"map"
.- seed
Optional integer random seed for reproducibility. The simulation for ancestry Y will use
seed + 1000
to ensure different DE gene sets.- X
Numeric matrix or data frame of counts for the first ancestry (samples in rows, genes in columns).
- Y
Numeric matrix or data frame of counts for the second ancestry (samples in rows, genes in columns).
Value
A list with the following elements:
X
Simulated count matrix (samples x genes) for ancestry X.
Y
Simulated count matrix (samples x genes) for ancestry Y.
MX
Sample metadata for ancestry X.
MY
Sample metadata for ancestry Y.
fX
Gene-level features (DE status, true log2FC) for ancestry X.
fY
Gene-level features (DE status, true log2FC) for ancestry Y.
pX
List of ggplot objects comparing means and dispersions for ancestry X.
pY
List of ggplot objects comparing means and dispersions for ancestry Y.
fI
Data frame of interaction effects, with DE status and true interaction log2FC.
Details
The interaction effect for each gene is defined as: $$\mathrm{Interaction\ log2FC} = \mathrm{log2FC}_Y - \mathrm{log2FC}_X$$ where \(\mathrm{log2FC}_X\) and \(\mathrm{log2FC}_Y\) are the true log2 fold-changes from the simulated data for ancestry X and Y respectively.