Skip to contents

Generate simulated RNA-seq count matrices for two ancestries, each with two conditions, based on parameters estimated from real data. This function calls sim_2group_expression twice (once per ancestry) and then calculates the true interaction effects between ancestries for each gene.

Usage

sim_4group_expression(
  estimates_X = NULL,
  estimates_Y = NULL,
  ancestry_X,
  ancestry_Y,
  n_samples_X,
  n_samples_Y,
  n_degs_X,
  n_degs_Y,
  log2FC_X,
  log2FC_Y,
  mean_method = c("mle", "map", "libnorm_mle", "libnorm_map"),
  disp_method = c("mle", "map"),
  seed = NULL
)

Arguments

estimates_X

Optional list of parameter estimates from estimate_params for ancestry X. If NULL, parameters are estimated from X.

estimates_Y

Optional list of parameter estimates from estimate_params for ancestry Y. If NULL, parameters are estimated from Y.

ancestry_X

Character scalar giving the ancestry label for X.

ancestry_Y

Character scalar giving the ancestry label for Y.

n_samples_X

Integer, number of samples to simulate per condition for ancestry X.

n_samples_Y

Integer, number of samples to simulate per condition for ancestry Y.

n_degs_X

Integer, number of differentially expressed genes to simulate in ancestry X.

n_degs_Y

Integer, number of differentially expressed genes to simulate in ancestry Y.

log2FC_X

Numeric, log2 fold-change magnitude for DEGs in ancestry X.

log2FC_Y

Numeric, log2 fold-change magnitude for DEGs in ancestry Y.

mean_method

Character string, method to use for mean estimates in both ancestries. One of "mle", "map", "libnorm_mle", "libnorm_map".

disp_method

Character string, method to use for dispersion estimates in both ancestries. One of "mle", "map".

seed

Optional integer random seed for reproducibility. The simulation for ancestry Y will use seed + 1000 to ensure different DE gene sets.

X

Numeric matrix or data frame of counts for the first ancestry (samples in rows, genes in columns).

Y

Numeric matrix or data frame of counts for the second ancestry (samples in rows, genes in columns).

Value

A list with the following elements:

X

Simulated count matrix (samples x genes) for ancestry X.

Y

Simulated count matrix (samples x genes) for ancestry Y.

MX

Sample metadata for ancestry X.

MY

Sample metadata for ancestry Y.

fX

Gene-level features (DE status, true log2FC) for ancestry X.

fY

Gene-level features (DE status, true log2FC) for ancestry Y.

pX

List of ggplot objects comparing means and dispersions for ancestry X.

pY

List of ggplot objects comparing means and dispersions for ancestry Y.

fI

Data frame of interaction effects, with DE status and true interaction log2FC.

Details

The interaction effect for each gene is defined as: $$\mathrm{Interaction\ log2FC} = \mathrm{log2FC}_Y - \mathrm{log2FC}_X$$ where \(\mathrm{log2FC}_X\) and \(\mathrm{log2FC}_Y\) are the true log2 fold-changes from the simulated data for ancestry X and Y respectively.