Skip to contents

This function filters the input matrix x depending on the features' intensity. Variables are removed if their values are lower than a provided minimum value in a given percentage of samples defined by min.prop.

See the Details section below for further information.

Usage

rowFilterByAboveMinRatio(x, g = NULL, min.expr = 0, min.prop = 0.5)

Arguments

x

matrix or data.frame, where rows are features and columns are observations.

g

(optional) vector or factor object giving the group for the corresponding elements of x.

min.expr

numerical value indicating the minimum expression required for min.prop samples.

min.prop

numerical value in the range \([0, 1]\). Minimum proportion of samples where the feature expression should be above min.expr. Default to 0.5.

Value

A logical vector of length nrow(x) indicating which rows of x passed the filter.

Details

For each feature, the above-minimum frequency ratio (AMFR) is computed as:

$$Above-Minimum Frequency Ratio (AMFR) = \frac{Number of samples where expression is greater than provided minimum}{Total number of observations}$$

Then, the i-th feature is kept if \(AMFR_{i} >= min.prop\).

If g is provided, the above-minimum frequency ratio (\(AMFR_{ij}\)) is computed for each group \(j\). The i-th feature is kept if \(AMFR_{ij} >= min.prop\) for at least one group.

Author

Alessandro Barberis

Examples

#Seed
set.seed(1010)

#Define row/col size
nr = 5
nc = 10

#Data
x = matrix(
 data = sample.int(n = 100, size = nr*nc, replace = TRUE),
 nrow = nr,
 ncol = nc,
 dimnames = list(
   paste0("f",seq(nr)),
   paste0("S",seq(nc))
 )
)

#Grouping variable
g = c(rep("a", nc/2), rep("b", nc/2))

#Filter
rowFilterByAboveMinRatio(x)
#>   f1   f2   f3   f4   f5 
#> TRUE TRUE TRUE TRUE TRUE 

#Filter by group
rowFilterByAboveMinRatio(x = x, g = g)
#>   f1   f2   f3   f4   f5 
#> TRUE TRUE TRUE TRUE TRUE 

#Set 1st feature to 0s for 2/3 observations
x[1,seq(2*nc/3)] = 0

#Set 3rd feature to 0s for 2/3 observations of class "a" and "b"
x[3,seq(2*nc/6)] = 0
x[3,(seq(2*nc/6)+nc/2)] = 0

#Filter (1st and 3rd features should be flagged to be removed)
rowFilterByAboveMinRatio(x = x, min.expr = 10)
#>    f1    f2    f3    f4    f5 
#> FALSE  TRUE FALSE  TRUE  TRUE 

#Filter by group (3rd feature should be flagged to be removed)
rowFilterByAboveMinRatio(x = x, min.expr = 10, g = g)
#>    f1    f2    f3    f4    f5 
#>  TRUE  TRUE FALSE  TRUE  TRUE