Simulate RNA-seq Expression Data for Two Groups
Source:R/sim_2group_expression.R
sim_2group_expression.Rd
Generate a simulated RNA-seq count matrix for two groups using parameters
estimated from real data and compcodeR::generateSyntheticData
.
Simulated data retain similar mean–variance characteristics as the input,
with a specified number of differentially expressed genes (DEGs).
Arguments
- estimates
Optional list of pre-computed parameter estimates from
estimate_params
. IfNULL
, parameters are estimated fromX
.- ancestry
Character scalar giving the ancestry label for this simulation.
- n_samples
Integer, number of samples to simulate per condition.
- n_degs
Integer, number of differentially expressed genes to simulate.
- log2FC
Numeric, log2 fold-change magnitude for DEGs.
- mean_method
Character string, method to use for mean estimates. One of
"mle"
,"map"
,"libnorm_mle"
,"libnorm_map"
.- disp_method
Character string, method to use for dispersion estimates. One of
"mle"
,"map"
.- seed
Optional integer random seed for reproducibility.
- X
Numeric matrix or data frame of counts from the real data (samples in rows, genes in columns).
Value
A list with the following elements:
X
Simulated count matrix (samples x genes).
M
Data frame of sample metadata.
f
Data frame of gene-level features: DE status (
is_DE
) and true log2FC (true_log2FC
).input_params
Parameter estimates from the real data.
output_params
Parameter estimates from the simulated data.
in_out_plots
List of ggplot objects comparing means and dispersions between real and simulated data.