Empirical Bayes Moderated Statistics — rowEBayesStatistics • featscreen

This function implements a common workflow to fit linear models and compute moderated t-statistics and moderated F-statistic by empirical Bayes moderation of the standard errors towards a global value.

If the input data was generated with microarray technology, the following steps are executed:

linear model fitting for each feature: lmFit
compute coefficients and standard errors for a given set of contrasts: contrasts.fit
compute moderated t-statistics and moderated F-statistic: eBayes
summarise the linear model fit: topTable

If the input data was generated with RNA-seq technology, the following steps are executed:

(optional) transform count data for linear modelling: voom
linear model fitting for each feature: lmFit
compute coefficients and standard errors for a given set of contrasts: contrasts.fit
compute moderated t-statistics and moderated F-statistic: eBayes
summarise the linear model fit: topTable

For complete details of each step see the manual pages of the respective functions.

See the Details section below for further information.

Usage

rowEBayesStatistics(
  x,
  y = NULL,
  observations = NULL,
  statistic = c("moderated.F", "moderated.t"),
  technology = c("array", "seq"),
  is.logged = TRUE,
  mean.variance = c("ebayes", "weights"),
  span = 0.5,
  method = c("ls", "robust"),
  design = NULL,
  weights = NULL,
  ndups = NULL,
  spacing = NULL,
  block = NULL,
  correlation = NULL,
  contrasts = NULL,
  proportion = 0.01,
  stdev.coef.lim = c(0.1, 4),
  robust = FALSE,
  winsor.tail.p = c(0.05, 0.1),
  coef = NULL,
  adjust.method = "BH",
  logger = NULL,
  ...
)

Arguments

x

a matrix-like data object with rows corresponding to genes and columns to observations.

y

a vector, factor or matrix. It is used to create a design matrix if not explicitly provided via the design argument.

observations

(optional) integer vector, the indices of observations to keep.

statistic

character string, indicating the moderated statistic to return.

technology

character string, the technology used to generate the data. Available options are:

array: data generated with microarray technology
seq: data generated with RNA-seq technology

is.logged

logical, whether the original data is already logged. If is.logged = FALSE, the data is internally transformed.

mean.variance

character string indicating whether the mean-variance relationship should be modeled with precision weights (mean.variance = "weights") or with an empirical Bayes prior trend (mean.variance = "ebayes").

span

width of the smoothing window used for the lowess mean-variance trend. Expressed as a proportion between 0 and 1.

method

fitting method; "ls" for least squares or "robust" for robust regression

design

the design matrix of the microarray experiment, with rows corresponding to samples and columns to coefficients to be estimated. Defaults to object$design if that is non-NULL, otherwise to the unit vector meaning that all samples will be treated as replicates of a single treatment group.

weights

non-negative precision weights. Can be a numeric matrix of individual weights of same size as the object expression matrix, or a numeric vector of array weights with length equal to ncol of the expression matrix, or a numeric vector of gene weights with length equal to nrow of the expression matrix.

ndups

positive integer giving the number of times each distinct probe is printed on each array.

spacing

positive integer giving the spacing between duplicate occurrences of the same probe, spacing=1 for consecutive rows.

block

vector or factor specifying a blocking variable on the arrays. Has length equal to the number of arrays. Must be NULL if ndups>2.

correlation

the inter-duplicate or inter-technical replicate correlation

contrasts

numeric matrix with rows corresponding to coefficients in fit and columns containing contrasts. May be a vector if there is only one contrast. NAs are not allowed.

proportion

numeric value between 0 and 1, assumed proportion of genes which are differentially expressed

stdev.coef.lim

numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes

robust

logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances?

winsor.tail.p

numeric vector of length 1 or 2, giving left and right tail proportions of x to Winsorize. Used only when robust=TRUE.

coef

column number or column name specifying which coefficient or contrast of the linear model is of interest. For topTable, can also be a vector of column subscripts, in which case the gene ranking is by F-statistic for that set of contrasts.

adjust.method

method used to adjust the p-values for multiple testing. Options, in increasing conservatism, include "none", "BH", "BY" and "holm". See p.adjust for the complete list of options. A NULL value will result in the default adjustment method, which is "BH".

logger

a Logger object.

...

further arguments to lmFit.

Value

A list containing two elements:

statistic: A numeric vector, the values of the test statistic
significance: A numeric vector, the p-values of the selected test

Author

Alessandro Barberis

Examples

#Seed
set.seed(1010)

#Define row/col size
nr = 20
nc = 20

#Data
x = matrix(
 data = stats::rnorm(n = nr*nc),
 nrow = nr,
 ncol = nc,
 dimnames = list(
   paste0("g",seq(nr)),
   paste0("S",seq(nc))
 )
)

#Categorical output vector (binomial)
y = c(rep("wt",nc/2), rep("mut",nc/2))
names(y) = paste0("S",seq(nc))
rowEBayesStatistics(x=x,y=y)
#> $statistic
#>  [1] 1.48314728 0.26127447 0.17149946 2.23891417 0.82910469 0.24320947
#>  [7] 2.81785282 1.14802510 0.44767056 0.04352112 0.02997531 0.92414587
#> [13] 1.60206921 1.45753831 0.21663719 0.55468754 0.92130594 0.62506081
#> [19] 0.02202450 1.13335883
#> 
#> $significance
#>  [1] 0.22692238 0.77006953 0.84240072 0.10657416 0.43643986 0.78410725
#>  [7] 0.05973406 0.31726271 0.63911520 0.95741234 0.97046949 0.39687025
#> [13] 0.20147918 0.23280867 0.80522206 0.57425166 0.39799894 0.53522888
#> [19] 0.97821627 0.32195006
#> 


#Categorical output vector (multinomial)
y = sample(x = c("I","II","III"),size=nc,replace=TRUE)
names(y) = paste0("S",seq(nc))
rowEBayesStatistics(x=x,y=y)
#> $statistic
#>  [1] 0.3141577 0.2767571 0.4573615 0.1812941 1.7104698 0.7240608 2.0080201
#>  [8] 0.7933097 0.3971676 0.4476769 0.1849767 0.5861202 0.8748885 1.5961187
#> [15] 1.1681319 0.9266135 0.3693172 1.6185753 0.7700783 2.5076014
#> 
#> $significance
#>  [1] 0.81515997 0.84219337 0.71216352 0.90911667 0.16330960 0.53775239
#>  [7] 0.11132285 0.49775168 0.75507606 0.71900680 0.90663455 0.62421406
#> [13] 0.45362805 0.18883625 0.32087593 0.42732238 0.77517234 0.18354502
#> [19] 0.51091086 0.05772779
#> 

#Numerical output vector
y = sample(x = seq(from=20,to=80),size=nc,replace=TRUE)
names(y) = paste0("S",seq(nc))
rowEBayesStatistics(x=x,y=y,statistic='moderated.t')
#> $statistic
#>  [1]  0.52687270 -0.57261137  0.38656922 -0.31476380 -1.57504242  0.65374814
#>  [7]  2.23496136 -1.06505656 -1.01418096  0.48217447  0.13606901 -0.66270033
#> [13] -0.57141521  1.98487512  0.76474499 -1.34074280 -0.48437799 -0.09490638
#> [19] -0.28749240 -1.04544828
#> 
#> $significance
#>  [1] 0.59858942 0.56724649 0.69929152 0.75311376 0.11607880 0.51366948
#>  [7] 0.02600044 0.28752660 0.31114172 0.62995962 0.89183878 0.50792424
#> [13] 0.56805603 0.04787704 0.44489805 0.18080490 0.62839669 0.92443922
#> [19] 0.77389204 0.29648039
#> 

#Design matrix
y = matrix(
 data = c(
    c(rep(1,nc/2), rep(0,nc/2)),
    c(rep(0,nc/2), rep(1,nc/2))
 ),
 nrow = nc,
 ncol = 2,
 dimnames = list(
   paste0("S",seq(nc)),
   c("wt", "mut")
 )
)
rowEBayesStatistics(x=x,y=y)
#> $statistic
#>  [1] 1.48314728 0.26127447 0.17149946 2.23891417 0.82910469 0.24320947
#>  [7] 2.81785282 1.14802510 0.44767056 0.04352112 0.02997531 0.92414587
#> [13] 1.60206921 1.45753831 0.21663719 0.55468754 0.92130594 0.62506081
#> [19] 0.02202450 1.13335883
#> 
#> $significance
#>  [1] 0.22692238 0.77006953 0.84240072 0.10657416 0.43643986 0.78410725
#>  [7] 0.05973406 0.31726271 0.63911520 0.95741234 0.97046949 0.39687025
#> [13] 0.20147918 0.23280867 0.80522206 0.57425166 0.39799894 0.53522888
#> [19] 0.97821627 0.32195006
#> 

#Numerical matrix
y = matrix(
 data = c(
    sample(x = seq(from=20,to=80),size=nc,replace=TRUE),
    sample(x = c(0,1),size=nc,replace=TRUE)
 ),
 nrow = nc,
 ncol = 2,
 dimnames = list(
   paste0("S",seq(nc)),
   c("age", "p53mut")
 )
)
rowEBayesStatistics(x=x,y=y)
#> $statistic
#>  [1] 0.17361441 0.03270602 2.44564690 0.05525935 1.06210275 1.39121561
#>  [7] 4.87511651 3.91450475 0.50455740 0.22729336 0.64457767 0.19686732
#> [13] 2.03564259 2.98884274 1.73522912 0.91429137 0.14726936 0.94040632
#> [19] 0.53077892 2.11076330
#> 
#> $significance
#>  [1] 0.840620973 0.967823042 0.086670050 0.946239713 0.345728065 0.248772711
#>  [7] 0.007634205 0.019950427 0.603772746 0.796687032 0.524884170 0.821299595
#> [13] 0.130596535 0.050345666 0.176359789 0.400800546 0.863061475 0.390469148
#> [19] 0.588146673 0.121145461
#>