Skip to contents

Estimate gene-level mean expression and dispersion parameters from RNA-seq count data using edgeR. The function fits an intercept-only GLM model to compute maximum likelihood estimates (MLE) and maximum a posteriori (MAP) estimates for means and dispersions, along with raw and logCPM means.

Usage

estimate_params(X, seed = NULL)

Arguments

X

Numeric matrix of raw RNA-seq counts with samples in rows and genes in columns.

Value

A list with the following elements:

mains

List with basic dataset information: n_samples (integer), n_features (integer), and features (gene IDs).

means

List of mean expression estimates: raw (raw mean counts), logcpm (mean log2 CPM), mle (fitted values from MLE dispersion), map (fitted values from MAP dispersion), libnorm_mle (MLE fitted means normalized by effective library size), libnorm_map (MAP fitted means normalized by effective library size).

disps

List of dispersion estimates: common (common dispersion), trend (trended dispersion), mle (tagwise dispersion without prior), map (tagwise dispersion with prior).

libsize

Numeric scalar giving the mean effective library size.

Details

The function uses an intercept-only design matrix to estimate baseline mean expression and dispersion parameters across all samples. The effective library size is computed as the product of the raw library size and the normalization factor estimated by calcNormFactors.

Important: The input X must be a matrix with samples in rows and genes in columns. Internally, the function transposes X to match the gene-by-sample format expected by edgeR.