Processing math: 100%
Skip to contents

This function filters the input matrix x depending on the features' intensity. Variables are removed if their values are lower than the medians computed across features in a given percentage of samples defined by min.prop.

See the Details section below for further information.

Usage

rowFilterByAboveMedianRatio(x, g = NULL, min.prop = 0.5)

Arguments

x

matrix or data.frame, where rows are features and columns are observations.

g

(optional) vector or factor object giving the group for the corresponding elements of x.

min.prop

numerical value in the range [0,1]. Minimum proportion of samples where the feature expression is above the median. Default to 0.5.

Value

A logical vector of length nrow(x) indicating which rows of x passed the filter.

Details

For each observation, the median across n features is computed as:

Median=x(n+1)2

where x is an ascendingly ordered vector of n elements, and n is odd. If n is even, then the median is computed as:

Median=xn2+x(n)2+12

If g = NULL, for each feature we define an above-median frequency ratio (AMFR) as the number of times the feature value is greater than the sample median divided by the total number of observations:

AboveMedianFrequencyRatio(AMFR)=NumberofsampleswherefeatureisabovethesamplemedianTotalnumberofobservations

Finally, the i-th feature is kept if AMFRi>=min.prop.

If g is provided, the above-median frequency ratio (AMFRij) is computed for each group j. The i-th feature is kept if AMFRij>=min.prop for at least one group.

Author

Alessandro Barberis

Examples

#Seed
set.seed(1010)

#Define row/col size
nr = 5
nc = 10

#Data
x = matrix(
 data = sample.int(n = 100, size = nr*nc, replace = TRUE),
 nrow = nr,
 ncol = nc,
 dimnames = list(
   paste0("f",seq(nr)),
   paste0("S",seq(nc))
 )
)

#Grouping variable
g = c(rep("a", nc/2), rep("b", nc/2))

#Filter
rowFilterByAboveMedianRatio(x)
#>    f1    f2    f3    f4    f5 
#>  TRUE FALSE  TRUE  TRUE FALSE 

#Filter by group
rowFilterByAboveMedianRatio(x = x, g = g)
#>   f1   f2   f3   f4   f5 
#> TRUE TRUE TRUE TRUE TRUE