Skip to contents

This function computes the ratio of values above the median for each feature.

See the Details section below for further information.

Usage

rowAboveMedianFreqRatio(x, g = NULL)

Arguments

x

matrix or data.frame, where rows are features and columns are observations.

g

(optional) vector or factor object giving the group for the corresponding elements of x.

Value

A vector of length nrow(x) containing the computed ratios. If g is provided, a matrix with ratios for each class as column vectors is returned.

Details

For each observation, the median is computed via colMedians.

Remember that the median across \(n\) elements is defined as:

$$Median = x_\frac{(n+1)}{2}$$

where \(x\) is an ascendingly ordered vector of \(n\) elements, and \(n\) is odd. If \(n\) is even, then the median is computed as:

$$Median = \frac{x_\frac{n}{2}+x_{\frac{(n)}{2}+1}}{2}$$

If g = NULL, for each feature we define an above-median frequency ratio (AMFR) as the number of times the feature value is greater than the sample median divided by the total number of observations:

$$Above-Median Frequency Ratio (AMFR) = \frac{Number of samples where feature is above the sample median}{Total number of observations}$$

If g is provided, the above-median frequency ratio (\(AMFR_{ij}\)) is computed for each group \(j\) as:

$$AMFR_{ij} = \frac{Number of samples in j-th class where feature is above the sample median}{Number of observations in j-th class}$$

Author

Alessandro Barberis

Examples

#Seed
set.seed(1010)

#Define row/col size
nr = 5
nc = 10

#Data
x = matrix(
 data = sample.int(n = 100, size = nr*nc, replace = TRUE),
 nrow = nr,
 ncol = nc,
 dimnames = list(
   paste0("f",seq(nr)),
   paste0("S",seq(nc))
 )
)

#Grouping variable
g = c(rep("a", nc/2), rep("b", nc/2))

#AMR
rowAboveMedianFreqRatio(x)
#>  f1  f2  f3  f4  f5 
#> 0.7 0.4 0.9 0.7 0.3 

#AMR by group
rowAboveMedianFreqRatio(x = x, g = g)
#>      a   b
#> f1 1.0 0.4
#> f2 0.2 0.6
#> f3 0.8 1.0
#> f4 1.0 0.4
#> f5 0.0 0.6