Skip to contents

Generate a simulated RNA-seq count matrix for two groups using parameters estimated from real data and compcodeR::generateSyntheticData. Simulated data retain similar mean–variance characteristics as the input, with a specified number of differentially expressed genes (DEGs).

Usage

sim_2group_expression(
  estimates,
  ancestry,
  n_samples,
  n_degs,
  log2FC,
  mean_method = c("mle", "map", "libnorm_mle", "libnorm_map"),
  disp_method = c("mle", "map"),
  seed = NULL
)

Arguments

estimates

Optional list of pre-computed parameter estimates from estimate_params. If NULL, parameters are estimated from X.

ancestry

Character scalar giving the ancestry label for this simulation.

n_samples

Integer, number of samples to simulate per condition.

n_degs

Integer, number of differentially expressed genes to simulate.

log2FC

Numeric, log2 fold-change magnitude for DEGs.

mean_method

Character string, method to use for mean estimates. One of "mle", "map", "libnorm_mle", "libnorm_map".

disp_method

Character string, method to use for dispersion estimates. One of "mle", "map".

seed

Optional integer random seed for reproducibility.

X

Numeric matrix or data frame of counts from the real data (samples in rows, genes in columns).

Value

A list with the following elements:

X

Simulated count matrix (samples x genes).

M

Data frame of sample metadata.

f

Data frame of gene-level features: DE status (is_DE) and true log2FC (true_log2FC).

input_params

Parameter estimates from the real data.

output_params

Parameter estimates from the simulated data.

in_out_plots

List of ggplot objects comparing means and dispersions between real and simulated data.