Simulate RNA-seq Expression Data for Four Groups (Two Ancestries × Two Conditions)
Source:R/sim_4group_expression.R
sim_4group_expression.RdGenerate simulated RNA-seq count matrices for two ancestries, each with two
conditions, based on parameters estimated from real data. This function calls
sim_2group_expression twice (once per ancestry) and then calculates
the true interaction effects between ancestries for each gene.
Arguments
- estimates_X
Optional list of parameter estimates from
estimate_paramsfor ancestry X. IfNULL, parameters are estimated fromX.- estimates_Y
Optional list of parameter estimates from
estimate_paramsfor ancestry Y. IfNULL, parameters are estimated fromY.- ancestry_X
Character scalar giving the ancestry label for
X.- ancestry_Y
Character scalar giving the ancestry label for
Y.- n_samples_X
Integer, number of samples to simulate per condition for ancestry X.
- n_samples_Y
Integer, number of samples to simulate per condition for ancestry Y.
- n_degs_X
Integer, number of differentially expressed genes to simulate in ancestry X.
- n_degs_Y
Integer, number of differentially expressed genes to simulate in ancestry Y.
- log2FC_X
Numeric, log2 fold-change magnitude for DEGs in ancestry X.
- log2FC_Y
Numeric, log2 fold-change magnitude for DEGs in ancestry Y.
- mean_method
Character string, method to use for mean estimates in both ancestries. One of
"mle","map","libnorm_mle","libnorm_map".- disp_method
Character string, method to use for dispersion estimates in both ancestries. One of
"mle","map".- seed
Optional integer random seed for reproducibility. The simulation for ancestry Y will use
seed + 1000to ensure different DE gene sets.- X
Numeric matrix or data frame of counts for the first ancestry (samples in rows, genes in columns).
- Y
Numeric matrix or data frame of counts for the second ancestry (samples in rows, genes in columns).
Value
A list with the following elements:
XSimulated count matrix (samples x genes) for ancestry X.
YSimulated count matrix (samples x genes) for ancestry Y.
MXSample metadata for ancestry X.
MYSample metadata for ancestry Y.
fXGene-level features (DE status, true log2FC) for ancestry X.
fYGene-level features (DE status, true log2FC) for ancestry Y.
pXList of ggplot objects comparing means and dispersions for ancestry X.
pYList of ggplot objects comparing means and dispersions for ancestry Y.
fIData frame of interaction effects, with DE status and true interaction log2FC.
Details
The interaction effect for each gene is defined as: $$\mathrm{Interaction\ log2FC} = \mathrm{log2FC}_Y - \mathrm{log2FC}_X$$ where \(\mathrm{log2FC}_X\) and \(\mathrm{log2FC}_Y\) are the true log2 fold-changes from the simulated data for ancestry X and Y respectively.