Skip to contents

This function filters the input matrix x depending on the features' intensity. Variables are removed if their values are lower than the medians computed across features in a given percentage of samples defined by min.prop.

See the Details section below for further information.

Usage

rowFilterByAboveMedianRatio(x, g = NULL, min.prop = 0.5)

Arguments

x

matrix or data.frame, where rows are features and columns are observations.

g

(optional) vector or factor object giving the group for the corresponding elements of x.

min.prop

numerical value in the range \([0, 1]\). Minimum proportion of samples where the feature expression is above the median. Default to 0.5.

Value

A logical vector of length nrow(x) indicating which rows of x passed the filter.

Details

For each observation, the median across \(n\) features is computed as:

$$Median = x_\frac{(n+1)}{2}$$

where \(x\) is an ascendingly ordered vector of \(n\) elements, and \(n\) is odd. If \(n\) is even, then the median is computed as:

$$Median = \frac{x_\frac{n}{2}+x_{\frac{(n)}{2}+1}}{2}$$

If g = NULL, for each feature we define an above-median frequency ratio (AMFR) as the number of times the feature value is greater than the sample median divided by the total number of observations:

$$Above-Median Frequency Ratio (AMFR) = \frac{Number of samples where feature is above the sample median}{Total number of observations}$$

Finally, the i-th feature is kept if \(AMFR_{i} >= min.prop\).

If g is provided, the above-median frequency ratio (\(AMFR_{ij}\)) is computed for each group \(j\). The i-th feature is kept if \(AMFR_{ij} >= min.prop\) for at least one group.

Author

Alessandro Barberis

Examples

#Seed
set.seed(1010)

#Define row/col size
nr = 5
nc = 10

#Data
x = matrix(
 data = sample.int(n = 100, size = nr*nc, replace = TRUE),
 nrow = nr,
 ncol = nc,
 dimnames = list(
   paste0("f",seq(nr)),
   paste0("S",seq(nc))
 )
)

#Grouping variable
g = c(rep("a", nc/2), rep("b", nc/2))

#Filter
rowFilterByAboveMedianRatio(x)
#>    f1    f2    f3    f4    f5 
#>  TRUE FALSE  TRUE  TRUE FALSE 

#Filter by group
rowFilterByAboveMedianRatio(x = x, g = g)
#>   f1   f2   f3   f4   f5 
#> TRUE TRUE TRUE TRUE TRUE