This function computes the ratio of values above the median for each feature.
See the Details section below for further information.
Arguments
- x
matrix
ordata.frame
, where rows are features and columns are observations.- g
(optional) vector or factor object giving the group for the corresponding elements of
x
.
Value
A vector
of length nrow(x)
containing the computed ratios.
If g
is provided, a matrix
with ratios for each class as column
vectors is returned.
Details
For each observation, the median is computed via
colMedians
.
Remember that the median across \(n\) elements is defined as:
$$Median = x_\frac{(n+1)}{2}$$
where \(x\) is an ascendingly ordered vector of \(n\) elements, and \(n\) is odd. If \(n\) is even, then the median is computed as:
$$Median = \frac{x_\frac{n}{2}+x_{\frac{(n)}{2}+1}}{2}$$
If g = NULL
, for each feature we define an above-median frequency ratio (AMFR) as the
number of times the feature value is greater than the sample median divided
by the total number of observations:
$$Above-Median Frequency Ratio (AMFR) = \frac{Number of samples where feature is above the sample median}{Total number of observations}$$
If g
is provided, the above-median frequency ratio (\(AMFR_{ij}\)) is
computed for each group \(j\) as:
$$AMFR_{ij} = \frac{Number of samples in j-th class where feature is above the sample median}{Number of observations in j-th class}$$
Examples
#Seed
set.seed(1010)
#Define row/col size
nr = 5
nc = 10
#Data
x = matrix(
data = sample.int(n = 100, size = nr*nc, replace = TRUE),
nrow = nr,
ncol = nc,
dimnames = list(
paste0("f",seq(nr)),
paste0("S",seq(nc))
)
)
#Grouping variable
g = c(rep("a", nc/2), rep("b", nc/2))
#AMR
rowAboveMedianFreqRatio(x)
#> f1 f2 f3 f4 f5
#> 0.7 0.4 0.9 0.7 0.3
#AMR by group
rowAboveMedianFreqRatio(x = x, g = g)
#> a b
#> f1 1.0 0.4
#> f2 0.2 0.6
#> f3 0.8 1.0
#> f4 1.0 0.4
#> f5 0.0 0.6