Empirical Bayes Moderated F-Statistic
Source:R/4-biostats-test-functions.R
rowModeratedOneWayAnova.Rd
This function implements a common workflow to fit linear models and compute moderated moderated F-statistic by empirical Bayes moderation of the standard errors towards a global value.
If the input data was generated with microarray technology, the following steps are executed:
linear model fitting for each feature:
lmFit
compute coefficients and standard errors for a given set of contrasts:
contrasts.fit
compute moderated moderated F-statistic:
eBayes
summarise the linear model fit:
topTable
If the input data was generated with RNA-seq technology, the following steps are executed:
(optional) transform count data for linear modelling:
voom
linear model fitting for each feature:
lmFit
compute coefficients and standard errors for a given set of contrasts:
contrasts.fit
compute moderated moderated F-statistic:
eBayes
summarise the linear model fit:
topTable
For complete details of each step see the manual pages of the respective functions.
Usage
rowModeratedOneWayAnova(
x,
y = NULL,
observations = NULL,
technology = c("array", "seq"),
is.logged = TRUE,
mean.variance = c("ebayes", "weights"),
span = 0.5,
method = c("ls", "robust"),
design = NULL,
weights = NULL,
ndups = NULL,
spacing = NULL,
block = NULL,
correlation = NULL,
contrasts = NULL,
proportion = 0.01,
stdev.coef.lim = c(0.1, 4),
robust = FALSE,
winsor.tail.p = c(0.05, 0.1),
coef = NULL,
adjust.method = "BH",
logger = NULL,
...
)
Arguments
- x
a matrix-like data object with rows corresponding to genes and columns to observations.
- y
a vector, factor or matrix. It is used to create a design matrix if not explicitly provided via the
design
argument.- observations
(optional) integer vector, the indices of observations to keep.
- technology
character string, the technology used to generate the data. Available options are:
- array
data generated with microarray technology
- seq
data generated with RNA-seq technology
- is.logged
logical, whether the original data is already logged. If
is.logged = FALSE
, the data is internally transformed.- mean.variance
character string indicating whether the mean-variance relationship should be modeled with precision weights (
mean.variance = "weights"
) or with an empirical Bayes prior trend (mean.variance = "ebayes"
).- span
width of the smoothing window used for the lowess mean-variance trend. Expressed as a proportion between 0 and 1.
- method
fitting method;
"ls"
for least squares or"robust"
for robust regression- design
the design matrix of the microarray experiment, with rows corresponding to samples and columns to coefficients to be estimated. Defaults to
object$design
if that is non-NULL, otherwise to the unit vector meaning that all samples will be treated as replicates of a single treatment group.- weights
non-negative precision weights. Can be a numeric matrix of individual weights of same size as the object expression matrix, or a numeric vector of array weights with length equal to
ncol
of the expression matrix, or a numeric vector of gene weights with length equal tonrow
of the expression matrix.- ndups
positive integer giving the number of times each distinct probe is printed on each array.
- spacing
positive integer giving the spacing between duplicate occurrences of the same probe,
spacing=1
for consecutive rows.- block
vector or factor specifying a blocking variable on the arrays. Has length equal to the number of arrays. Must be
NULL
ifndups>2
.- correlation
the inter-duplicate or inter-technical replicate correlation
- contrasts
numeric matrix with rows corresponding to coefficients in
fit
and columns containing contrasts. May be a vector if there is only one contrast.NA
s are not allowed.- proportion
numeric value between 0 and 1, assumed proportion of genes which are differentially expressed
- stdev.coef.lim
numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes
- robust
logical, should the estimation of
df.prior
andvar.prior
be robustified against outlier sample variances?- winsor.tail.p
numeric vector of length 1 or 2, giving left and right tail proportions of
x
to Winsorize. Used only whenrobust=TRUE
.- coef
column number or column name specifying which coefficient or contrast of the linear model is of interest. For
topTable
, can also be a vector of column subscripts, in which case the gene ranking is by F-statistic for that set of contrasts.- adjust.method
method used to adjust the p-values for multiple testing. Options, in increasing conservatism, include
"none"
,"BH"
,"BY"
and"holm"
. Seep.adjust
for the complete list of options. ANULL
value will result in the default adjustment method, which is"BH"
.- logger
a
Logger
object.- ...
further arguments to
lmFit
.
Value
A list containing two elements:
- statistic
A numeric vector, the values of the test statistic
- significance
A numeric vector, the p-values of the selected test
Examples
#Seed
set.seed(1010)
#Define row/col size
nr = 20
nc = 20
#Data
x = matrix(
data = stats::rnorm(n = nr*nc),
nrow = nr,
ncol = nc,
dimnames = list(
paste0("g",seq(nr)),
paste0("S",seq(nc))
)
)
#Categorical output vector (multinomial)
y = sample(x = c("I","II","III"),size=nc,replace=TRUE)
names(y) = paste0("S",seq(nc))
rowEBayesStatistics(x=x,y=y)
#> $statistic
#> [1] 0.3141577 0.2767571 0.4573615 0.1812941 1.7104698 0.7240608 2.0080201
#> [8] 0.7933097 0.3971676 0.4476769 0.1849767 0.5861202 0.8748885 1.5961187
#> [15] 1.1681319 0.9266135 0.3693172 1.6185753 0.7700783 2.5076014
#>
#> $significance
#> [1] 0.81515997 0.84219337 0.71216352 0.90911667 0.16330960 0.53775239
#> [7] 0.11132285 0.49775168 0.75507606 0.71900680 0.90663455 0.62421406
#> [13] 0.45362805 0.18883625 0.32087593 0.42732238 0.77517234 0.18354502
#> [19] 0.51091086 0.05772779
#>