This function implements a common workflow to fit linear models and compute moderated t-statistics and moderated F-statistic by empirical Bayes moderation of the standard errors towards a global value.
If the input data was generated with microarray technology, the following steps are executed:
linear model fitting for each feature:
lmFit
compute coefficients and standard errors for a given set of contrasts:
contrasts.fit
compute moderated t-statistics and moderated F-statistic:
eBayes
summarise the linear model fit:
topTable
If the input data was generated with RNA-seq technology, the following steps are executed:
(optional) transform count data for linear modelling:
voom
linear model fitting for each feature:
lmFit
compute coefficients and standard errors for a given set of contrasts:
contrasts.fit
compute moderated t-statistics and moderated F-statistic:
eBayes
summarise the linear model fit:
topTable
For complete details of each step see the manual pages of the respective functions.
See the Details section below for further information.
Usage
rowEBayesStatistics(
x,
y = NULL,
observations = NULL,
statistic = c("moderated.F", "moderated.t"),
technology = c("array", "seq"),
is.logged = TRUE,
mean.variance = c("ebayes", "weights"),
span = 0.5,
method = c("ls", "robust"),
design = NULL,
weights = NULL,
ndups = NULL,
spacing = NULL,
block = NULL,
correlation = NULL,
contrasts = NULL,
proportion = 0.01,
stdev.coef.lim = c(0.1, 4),
robust = FALSE,
winsor.tail.p = c(0.05, 0.1),
coef = NULL,
adjust.method = "BH",
logger = NULL,
...
)
Arguments
- x
a matrix-like data object with rows corresponding to genes and columns to observations.
- y
a vector, factor or matrix. It is used to create a design matrix if not explicitly provided via the
design
argument.- observations
(optional) integer vector, the indices of observations to keep.
- statistic
character string, indicating the moderated statistic to return.
- technology
character string, the technology used to generate the data. Available options are:
- array
data generated with microarray technology
- seq
data generated with RNA-seq technology
- is.logged
logical, whether the original data is already logged. If
is.logged = FALSE
, the data is internally transformed.- mean.variance
character string indicating whether the mean-variance relationship should be modeled with precision weights (
mean.variance = "weights"
) or with an empirical Bayes prior trend (mean.variance = "ebayes"
).- span
width of the smoothing window used for the lowess mean-variance trend. Expressed as a proportion between 0 and 1.
- method
fitting method;
"ls"
for least squares or"robust"
for robust regression- design
the design matrix of the microarray experiment, with rows corresponding to samples and columns to coefficients to be estimated. Defaults to
object$design
if that is non-NULL, otherwise to the unit vector meaning that all samples will be treated as replicates of a single treatment group.- weights
non-negative precision weights. Can be a numeric matrix of individual weights of same size as the object expression matrix, or a numeric vector of array weights with length equal to
ncol
of the expression matrix, or a numeric vector of gene weights with length equal tonrow
of the expression matrix.- ndups
positive integer giving the number of times each distinct probe is printed on each array.
- spacing
positive integer giving the spacing between duplicate occurrences of the same probe,
spacing=1
for consecutive rows.- block
vector or factor specifying a blocking variable on the arrays. Has length equal to the number of arrays. Must be
NULL
ifndups>2
.- correlation
the inter-duplicate or inter-technical replicate correlation
- contrasts
numeric matrix with rows corresponding to coefficients in
fit
and columns containing contrasts. May be a vector if there is only one contrast.NA
s are not allowed.- proportion
numeric value between 0 and 1, assumed proportion of genes which are differentially expressed
- stdev.coef.lim
numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes
- robust
logical, should the estimation of
df.prior
andvar.prior
be robustified against outlier sample variances?- winsor.tail.p
numeric vector of length 1 or 2, giving left and right tail proportions of
x
to Winsorize. Used only whenrobust=TRUE
.- coef
column number or column name specifying which coefficient or contrast of the linear model is of interest. For
topTable
, can also be a vector of column subscripts, in which case the gene ranking is by F-statistic for that set of contrasts.- adjust.method
method used to adjust the p-values for multiple testing. Options, in increasing conservatism, include
"none"
,"BH"
,"BY"
and"holm"
. Seep.adjust
for the complete list of options. ANULL
value will result in the default adjustment method, which is"BH"
.- logger
a
Logger
object.- ...
further arguments to
lmFit
.
Value
A list containing two elements:
- statistic
A numeric vector, the values of the test statistic
- significance
A numeric vector, the p-values of the selected test
Examples
#Seed
set.seed(1010)
#Define row/col size
nr = 20
nc = 20
#Data
x = matrix(
data = stats::rnorm(n = nr*nc),
nrow = nr,
ncol = nc,
dimnames = list(
paste0("g",seq(nr)),
paste0("S",seq(nc))
)
)
#Categorical output vector (binomial)
y = c(rep("wt",nc/2), rep("mut",nc/2))
names(y) = paste0("S",seq(nc))
rowEBayesStatistics(x=x,y=y)
#> $statistic
#> [1] 1.48314728 0.26127447 0.17149946 2.23891417 0.82910469 0.24320947
#> [7] 2.81785282 1.14802510 0.44767056 0.04352112 0.02997531 0.92414587
#> [13] 1.60206921 1.45753831 0.21663719 0.55468754 0.92130594 0.62506081
#> [19] 0.02202450 1.13335883
#>
#> $significance
#> [1] 0.22692238 0.77006953 0.84240072 0.10657416 0.43643986 0.78410725
#> [7] 0.05973406 0.31726271 0.63911520 0.95741234 0.97046949 0.39687025
#> [13] 0.20147918 0.23280867 0.80522206 0.57425166 0.39799894 0.53522888
#> [19] 0.97821627 0.32195006
#>
#Categorical output vector (multinomial)
y = sample(x = c("I","II","III"),size=nc,replace=TRUE)
names(y) = paste0("S",seq(nc))
rowEBayesStatistics(x=x,y=y)
#> $statistic
#> [1] 0.3141577 0.2767571 0.4573615 0.1812941 1.7104698 0.7240608 2.0080201
#> [8] 0.7933097 0.3971676 0.4476769 0.1849767 0.5861202 0.8748885 1.5961187
#> [15] 1.1681319 0.9266135 0.3693172 1.6185753 0.7700783 2.5076014
#>
#> $significance
#> [1] 0.81515997 0.84219337 0.71216352 0.90911667 0.16330960 0.53775239
#> [7] 0.11132285 0.49775168 0.75507606 0.71900680 0.90663455 0.62421406
#> [13] 0.45362805 0.18883625 0.32087593 0.42732238 0.77517234 0.18354502
#> [19] 0.51091086 0.05772779
#>
#Numerical output vector
y = sample(x = seq(from=20,to=80),size=nc,replace=TRUE)
names(y) = paste0("S",seq(nc))
rowEBayesStatistics(x=x,y=y,statistic='moderated.t')
#> $statistic
#> [1] 0.52687270 -0.57261137 0.38656922 -0.31476380 -1.57504242 0.65374814
#> [7] 2.23496136 -1.06505656 -1.01418096 0.48217447 0.13606901 -0.66270033
#> [13] -0.57141521 1.98487512 0.76474499 -1.34074280 -0.48437799 -0.09490638
#> [19] -0.28749240 -1.04544828
#>
#> $significance
#> [1] 0.59858942 0.56724649 0.69929152 0.75311376 0.11607880 0.51366948
#> [7] 0.02600044 0.28752660 0.31114172 0.62995962 0.89183878 0.50792424
#> [13] 0.56805603 0.04787704 0.44489805 0.18080490 0.62839669 0.92443922
#> [19] 0.77389204 0.29648039
#>
#Design matrix
y = matrix(
data = c(
c(rep(1,nc/2), rep(0,nc/2)),
c(rep(0,nc/2), rep(1,nc/2))
),
nrow = nc,
ncol = 2,
dimnames = list(
paste0("S",seq(nc)),
c("wt", "mut")
)
)
rowEBayesStatistics(x=x,y=y)
#> $statistic
#> [1] 1.48314728 0.26127447 0.17149946 2.23891417 0.82910469 0.24320947
#> [7] 2.81785282 1.14802510 0.44767056 0.04352112 0.02997531 0.92414587
#> [13] 1.60206921 1.45753831 0.21663719 0.55468754 0.92130594 0.62506081
#> [19] 0.02202450 1.13335883
#>
#> $significance
#> [1] 0.22692238 0.77006953 0.84240072 0.10657416 0.43643986 0.78410725
#> [7] 0.05973406 0.31726271 0.63911520 0.95741234 0.97046949 0.39687025
#> [13] 0.20147918 0.23280867 0.80522206 0.57425166 0.39799894 0.53522888
#> [19] 0.97821627 0.32195006
#>
#Numerical matrix
y = matrix(
data = c(
sample(x = seq(from=20,to=80),size=nc,replace=TRUE),
sample(x = c(0,1),size=nc,replace=TRUE)
),
nrow = nc,
ncol = 2,
dimnames = list(
paste0("S",seq(nc)),
c("age", "p53mut")
)
)
rowEBayesStatistics(x=x,y=y)
#> $statistic
#> [1] 0.17361441 0.03270602 2.44564690 0.05525935 1.06210275 1.39121561
#> [7] 4.87511651 3.91450475 0.50455740 0.22729336 0.64457767 0.19686732
#> [13] 2.03564259 2.98884274 1.73522912 0.91429137 0.14726936 0.94040632
#> [19] 0.53077892 2.11076330
#>
#> $significance
#> [1] 0.840620973 0.967823042 0.086670050 0.946239713 0.345728065 0.248772711
#> [7] 0.007634205 0.019950427 0.603772746 0.796687032 0.524884170 0.821299595
#> [13] 0.130596535 0.050345666 0.176359789 0.400800546 0.863061475 0.390469148
#> [19] 0.588146673 0.121145461
#>