This function implements a common workflow to compute repeated permutations of the input data to determine if any features are significantly related to the response.
The following steps are executed:
correlate each feature with outcome variable:
samr
compute thresholds, cutpoints, and false discovery rates for SAM analysis:
samr.compute.delta.table
compute SAM statistics and significance:
samr.compute.siggenes.table
Usage
rowSamStatistics(
x,
y = NULL,
observations = NULL,
technology = c("array", "seq"),
geneid = NULL,
genenames = NULL,
censoring.status = NULL,
logged2 = FALSE,
eigengene.number = 1,
resp.type = c("Quantitative", "Two class unpaired", "Survival", "Multiclass",
"One class", "Two class paired", "Two class unpaired timecourse",
"One class timecourse", "Two class paired timecourse", "Pattern discovery"),
s0 = NULL,
s0.perc = NULL,
nperms = 100,
center.arrays = FALSE,
testStatistic = c("standard", "wilcoxon"),
time.summary.type = c("slope", "signed.area"),
regression.method = c("standard", "ranks"),
knn.neighbors = 10,
random.seed = NULL,
nresamp = 20,
nresamp.perm = NULL,
dels = NULL,
nvals = 50,
fdr.output = 0.2,
logger = NULL
)
Arguments
- x
Feature matrix: p (number of features) by n (number of samples), one observation per column (missing values allowed)
- y
n-vector of outcome measurements
- observations
(optional) integer vector, the indices of observations to keep.
- technology
character string, the technology used to generate the data. Available options are:
- array
data generated with microarray technology
- seq
data generated with RNA-seq technology
- geneid
Optional character vector of geneids for output.
- genenames
Optional character vector of genenames for output.
- censoring.status
n-vector of censoring censoring.status (1= died or event occurred, 0=survived, or event was censored), needed for a censored survival outcome
- logged2
Has the data been transformed by log (base 2)? This information is used only for computing fold changes
- eigengene.number
Eigengene to be used (just for resp.type="Pattern discovery")
- resp.type
Problem type: "Quantitative" for a continuous parameter (Available for both array and sequencing data); "Two class unpaired" (for both array and sequencing data); "Survival" for censored survival outcome (for both array and sequencing data); "Multiclass": more than 2 groups (for both array and sequencing data); "One class" for a single group (only for array data); "Two class paired" for two classes with paired observations (for both array and sequencing data); "Two class unpaired timecourse" (only for array data), "One class time course" (only for array data), "Two class.paired timecourse" (only for array data), or "Pattern discovery" (only for array data)
- s0
Exchangeability factor for denominator of test statistic; Default is automatic choice. Only used for array data.
- s0.perc
Percentile of standard deviation values to use for s0; default is automatic choice; -1 means s0=0 (different from s0.perc=0, meaning s0=zeroeth percentile of standard deviation values= min of sd values. Only used for array data.
- nperms
Number of permutations used to estimate false discovery rates
- center.arrays
Should the data for each sample (array) be median centered at the outset? Default =FALSE. Only used for array data.
- testStatistic
Test statistic to use in two class unpaired case.Either "standard" (t-statistic) or ,"wilcoxon" (Two-sample wilcoxon or Mann-Whitney test). Only used for array data.
- time.summary.type
Summary measure for each time course: "slope", or "signed.area"). Only used for array data.
- regression.method
Regression method for quantitative case: "standard", (linear least squares) or "ranks" (linear least squares on ranked data). Only used for array data.
- knn.neighbors
Number of nearest neighbors to use for imputation of missing features values. Only used for array data.
- random.seed
Optional initial seed for random number generator (integer)
- nresamp
For assay.type="seq", number of resamples used to construct test statistic. Default 20. Only used for sequencing data.
- nresamp.perm
For assay.type="seq", number of resamples used to construct test statistic for permutations. Default is equal to nresamp and it must be at most nresamp. Only used for sequencing data.
- dels
vector of delta values used. Delta is the vertical distance from the 45 degree line to the upper and lower parallel lines that define the SAM threshold rule. By default, for array data, 50 values are chosen in the relevant operating change for delta. For sequencing data, the maximum number of effective delta values are chosen automatically according to the data.
- nvals
Number of delta values used. For array data, the default value is 50. For sequencing data, the value will be chosen automatically.
- fdr.output
(Approximate) False Discovery Rate cutoff for output in significant genes table
- logger
a
Logger
object.
Value
A list containing two elements:
- statistic
A numeric vector, the values of the test statistic
- significance
A numeric vector, the q-values of the selected test
Examples
#Seed
set.seed(1010)
#Define row/col size
nr = 10
nc = 20
#Data
x = matrix(
data = stats::rnorm(n = nr*nc),
nrow = nr,
ncol = nc,
dimnames = list(
paste0("f",seq(nr)),
paste0("S",seq(nc))
)
)
#Categorical output vector (binomial)
y = c(rep(1,nc/2), rep(2,nc/2))
names(y) = paste0("S",seq(nc))
rowSamStatistics(x=x,y=y)
#> $statistic
#> [1] -0.181 0.234 -0.021 0.747 0.415 1.120 -0.020 0.482 0.070 0.600
#>
#> $significance
#> [1] 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
#>
#Categorical output vector (multinomial)
y = c(rep(1,nc/4), rep(2,nc/4), rep(3,nc/2))
names(y) = paste0("S",seq(nc))
rowSamStatistics(x=x, y=y, resp.type = "Multiclass")
#> $statistic
#> [1] 0.500 0.219 0.048 0.345 0.194 0.778 0.089 0.286 0.272 0.271
#>
#> $significance
#> [1] 0.6 0.6 0.6 0.6 0.6 0.0 0.6 0.6 0.6 0.6
#>