Estimate RNA-seq Model Parameters from Count Data

Estimate gene-level mean expression and dispersion parameters from RNA-seq count data using edgeR. The function fits an intercept-only GLM model to compute maximum likelihood estimates (MLE) and maximum a posteriori (MAP) estimates for means and dispersions, along with raw and logCPM means.

Usage

estimate_params(X, seed = NULL)

Arguments

X: Numeric matrix of raw RNA-seq counts with samples in rows and genes in columns.

Value

A list with the following elements:

mains: List with basic dataset information: n_samples (integer), n_features (integer), and features (gene IDs).
means: List of mean expression estimates: raw (raw mean counts), logcpm (mean log2 CPM), mle (fitted values from MLE dispersion), map (fitted values from MAP dispersion), libnorm_mle (MLE fitted means normalized by effective library size), libnorm_map (MAP fitted means normalized by effective library size).
disps: List of dispersion estimates: common (common dispersion), trend (trended dispersion), mle (tagwise dispersion without prior), map (tagwise dispersion with prior).
libsize: Numeric scalar giving the mean effective library size.

Details

The function uses an intercept-only design matrix to estimate baseline mean expression and dispersion parameters across all samples. The effective library size is computed as the product of the raw library size and the normalization factor estimated by calcNormFactors.

Important: The input X must be a matrix with samples in rows and genes in columns. Internally, the function transposes X to match the gene-by-sample format expected by edgeR.

Usage

Arguments

Value

Details

See also