This function computes a statistical measure for each feature
in input. In case of multi-response data, the screening statistics are then
combined as defined by "multi"
. Finally, the features to keep are
obtained via the chosen selecting method as indicated by select.by
.
See the Details section below for further information.
Usage
screen(
x,
y = NULL,
g = NULL,
method = c("cor.test", "pearson", "spearman", "kendall", "t.test", "t.test.equal",
"t.test.unequal", "t.test.paired", "w.test", "w.test.ranksum", "w.test.paired",
"anova", "anova.equal", "anova.unequal", "kruskal.wallis", "chisq.test", "coxph",
"moderated.t", "moderated.F", "sam.test", "missing.value", "above.median",
"above.minimum", "median", "variability"),
...,
multi = c("max", "min", "avg", "sum", "idx"),
idx = NULL,
select.by = c("cutoff", "rank", "percentile", "fpr", "fdr"),
select.args = NULL
)
Arguments
- x
matrix
ordata.frame
, where rows are features and columns are observations.- y
numeric vector of data values having the same length as
ncol(x)
ordata.frame
with two columns,time
andstatus
.- g
(optional) vector or factor object giving the group for the corresponding elements of
x
.- method
character string, one of the supported screening techniques.
- ...
further arguments to screening function.
- multi
character string indicating what to do in case of multi-response. Available options are:
"max"
the maximum value across responses is kept
"min"
the minimum value across responses is kept
"avg"
values are averaged
"sum"
values are summed up
"idx"
return the column indicated by
idx
- idx
(optional) integer value or character string indicating the column of
x
to keep.- select.by
character string indicating the selecting method. Available options are:
"cutoff"
selection by cutoff
"rank"
selection by ranking
"percentile"
selection by top percentile
"fpr"
selection by false positive rate
"fdr"
selection by false discovery rate
- select.args
(optional) named list, arguments to be passed to the selecting function.
Details
This function uses one of the selected screening technique to compute a statistical measure for each feature.
See the following functions for each specific implementation:
"cor.test"
"pearson"
"spearman"
"kendall"
"t.test"
"t.test.equal"
"t.test.unequal"
"t.test.paired"
"w.test"
"w.test.ranksum"
"w.test.paired"
"anova"
"anova.equal"
"anova.unequal"
"kruskal.wallis"
"chisq.test"
"coxph"
"moderated.t"
"moderated.F"
"sam.test"
"missing.value"
"above.median"
"above.minimum"
"median"
"variability"
In case of multi-response data, the screening statistics are then
combined by using the multiresponse
function.
Finally, the features to keep are obtained via the chosen selecting method as
indicated by select.by
.
See the following functions for each specific implementation:
"cutoff"
"rank"
"percentile"
"fpr"
"fdr"
See also
Use listAvailableScreeningMethods
to list the available
built-in screening methods.
Use listAvailableSelectionFunctions
to list the available
built-in selection functions.
Examples
#Seed
set.seed(1010)
#Define row/col size
nr = 5
nc = 10
# Unsupervised Screening
#Data
x = matrix(
data = sample(x = c(1,2), size = nr*nc, replace = TRUE),
nrow = nr,
ncol = nc,
dimnames = list(
paste0("f",seq(nr)),
paste0("S",seq(nc))
)
)
#Grouping variable
g = c(rep("a", nc/2), rep("b", nc/2))
#Force 1st feature to have 40% of missing values
x[1,seq(nc*0.4)] = NA
#Filter a feature if has more than 50% of missing values
screen(
x = x,
method = "missing.value",
select.args = list(cutoff = 0.5)
)
#>
#> 5 out of 5 features selected by a cutoff (< 0.5) on the missing value
#> ratio.
#>
#> Top 5 ranked features: f2, f3, f4, f5, f1
#>
# Supervised Screening
#Filter by two-sample t-Test (cutoff on t statistic)
screen(
x = x,
g = g,
method = "t.test",
var = "equal",
select.args = list(cutoff = 0.5)
)
#>
#> 4 out of 5 features selected by a cutoff (< 0.5) on the two-sample
#> Student's pooled t-test.
#>
#> Top 5 ranked features: f2, f5, f4, f1, f3
#>