Supervised screening methods • featscreen

Introduction

In the realm of data analysis, supervised feature screening emerges as a pivotal process, guiding the selection of relevant features to enhance predictive modeling and classification accuracy.

featscreen empowers analysts and data scientists by providing a comprehensive suite of ready-to-use functions tailored for supervised feature screening.

From traditional correlation-based methods to sophisticated statistical approaches such as Cox Proportional-Hazard models for survival data, featscreen aims to bridge the gap between complex methodologies and user-friendly implementation.

In this article we explore the different supervised feature screening methods supported by featscreen.

Setup

Firstly, we load featscreen and other needed packages:

library(featscreen)
#> 
#> Attaching package: 'featscreen'
#> The following objects are masked from 'package:stats':
#> 
#>     mad, sd
#> The following object is masked from 'package:graphics':
#> 
#>     screen

Seed

Now we want to set a seed for the random number generation (RNG). In fact, different R sessions have different seeds created from current time and process ID by default, and consequently different simulation results. By fixing a seed we ensure we will be able to reproduce the results of this vignette. We can specify a seed by calling ?set.seed.

#Set a seed for RNG
set.seed(
  #A seed
  seed = 5381L,                   #a randomly chosen integer value
  #The kind of RNG to use
  kind = "Mersenne-Twister",      #we make explicit the current R default value
  #The kind of Normal generation
  normal.kind = "Inversion"       #we make explicit the current R default value
)

Screening Methods

To view the list of currently supported supervised screening methods, use the ?listAvailableScreeningMethods function with the x parameter set to supervised:

#list screening methods
screening.methods = listAvailableScreeningMethods(x = 'supervised')

#print in table
knitr::kable(x = screening.methods, align = 'rc')

id	name
pearson	Pearson’s product moment correlation coefficient t-test
spearman	Spearman’s rank correlation coefficient t-test
kendall	Kendall’s rank correlation coefficient t-test
t.test.equal	two-sample Student’s pooled t-test
t.test.unequal	two-sample t-test with the Welch modification to the degrees of freedom
t.test.paired	paired two-sample Student’s t-test
w.test.ranksum	two-sample Mann-Whitney U-test
w.test.paired	paired two-sample Wilcoxon signed-rank test
anova.equal	one-way analysis of variance F-test
anova.unequal	one-way analysis of variance F-test with Welch correction
kruskal.wallis	Kruskal-Wallis H-test
chisq.test	Pearson’s χ²-test
coxph	Cox PH regression coefficient z-test
moderated.t	empirical Bayes moderated t-test
moderated.F	empirical Bayes moderated F-test
sam.test	significant analysis of microarrays permutation test

Correlation

Correlation analysis plays a fundamental role in understanding relationships between variables in the context of supervised feature screening. It provides a quantitative measure of the strength and direction of associations, helping identify features that may contribute significantly to predictive models.

featscreen supports various correlation methods, each catering to different data characteristics and analytical needs.

Pearson’s Correlation

Pearson’s correlation is a widely-used measure that assesses linear relationships between features and the target variable. Represented by the correlation coefficient r, this method quantifies the extent to which two variables change together. The coefficient ranges from -1 to 1, where -1 indicates a perfect negative linear relationship, 1 denotes a perfect positive linear relationship, and 0 signifies no linear correlation. Pearson’s correlation is sensitive to outliers and assumes a linear relationship between variables.

Pearson’s correlation is computed in featscreen by calling the function ?rowPearsonCor. From the documentation, we can see that this is a wrapper function that internally calls ?row_cor_pearson from the matrixTests package.

Let’s explore a practical example to illustrate the application of Pearson’s correlation.

#Data
x = matrix(rnorm(10 * 20), 10, 20)
y = rnorm(20)

#Compute
rowPearsonCor(x = x, y = y)
#> $statistic
#>  [1]  1.55424671 -3.21135119 -0.48824958 -0.62940963  0.01709722 -1.30914381
#>  [7] -1.07986577  1.11138904 -0.42259796  0.39229005
#> 
#> $significance
#>  [1] 0.137531731 0.004840161 0.631267786 0.536988472 0.986547148 0.206952421
#>  [7] 0.294459519 0.281026013 0.677593887 0.699450166

Spearman’s Correlation

Spearman’s correlation is a non-parametric measure that evaluates monotonic relationships between variables. By ranking the data values and assessing the correlation of the ranks, Spearman’s correlation provides a robust alternative to Pearson’s correlation, especially in the presence of outliers or non-linear patterns. The resulting coefficient, denoted by rho, ranges between -1 and 1, where extreme values indicate strong monotonic relationships.

In featscreen, Spearman’s correlation is computed by calling the function ?rowSpearmanCor. From the documentation, we can see that this is a wrapper function that internally calls ?cor.test from the stats package.

Let see an example.

#Data
x = matrix(rnorm(10 * 20), 10, 20)
y = rnorm(20)

#Compute
rowSpearmanCor(x = x, y = y)
#> $statistic
#>  [1] 2004 1418 1582 1022 1114 1254 1376  770 1502 1542
#> 
#> $significance
#>  [1] 0.02410399 0.78197196 0.42194872 0.32438799 0.49229843 0.81146975
#>  [7] 0.88623825 0.06578794 0.58563682 0.50046206

Kendall’s Correlation

Kendall’s correlation is another non-parametric method for assessing correlation. Similar to Spearman’s correlation, it is based on the ranking of data. It measures the similarity in the ordering of data pairs between two variables, making it suitable for ordinal data and less sensitive to outliers. Kendall’s coefficient, denoted by tau, ranges from -1 to 1, with negative values indicating a negative association, positive values signifying a positive association, and 0 representing no association.

In featscreen, Kendall’s correlation is computed by calling the function ?rowKendallCor. From the documentation, we can see that this is a wrapper function that internally calls ?cor.test from the stats package.

Let see an example.

#Data
x = matrix(rnorm(10 * 20), 10, 20)
y = rnorm(20)

#Compute
rowKendallCor(x = x, y = y)
#> $statistic
#>  [1]  88  81  73 121  77 112  74 117  63  96
#> 
#> $significance
#>  [1] 0.67710824 0.38585715 0.16497641 0.09833022 0.25983999 0.28837820
#>  [7] 0.18588304 0.16497641 0.03976214 0.97446690

Two-Sample Tests

Two-sample tests are indispensable tools in supervised feature screening, offering insights into the differences between two groups of data.

featscreen incorporates a range of two-sample tests, each catering to diverse scenarios and assumptions. Whether examining mean differences with parametric tests or assessing variations in medians with non-parametric tests, these methods play a pivotal role in uncovering significant distinctions between groups, contributing to the identification of influential features within the feature space.

Two-Sample Student’s t-Test

The Two-Sample Student’s t-Test is a parametric method used to assess whether the means of two groups are significantly different. It is applicable when the data follow a normal distribution. The t-test results in a t-statistic and a p-value, helping analysts determine the statistical significance of observed differences.

Two-sample t-tests can be further divided into:

Unpaired two-samples t-tests
Paired two-samples t-tests

Unpaired Samples

T-tests provide invaluable insights into differences between two independent groups.

featscreen supports unpaired two-sample t-tests, offering flexibility to handle scenarios with both equal and unequal variances. This class of t-tests is particularly applicable when comparing means between two groups where each observation in one group is independent of the other.

Equal Variance T-Test

The Equal Variance T-Test, often referred to as the traditional or pooled t-test, assumes that the variances of the two groups being compared are equal. This test is suitable when the data in both groups follow a normal distribution and exhibit homogeneity of variances.

featscreen supports the Equal Variance T-Test through the ?rowEqualVarT function.

From the documentation, we can see that this is a wrapper function that internally calls ?row_t_equalvar from the matrixTests package.

#Data
x = matrix(rnorm(10 * 20), 10, 20)
# Assuming 'group1' and 'group2' are indicating independent measurements
g = c(rep('group1',10),rep('group2',10))

#Compute
rowEqualVarT(x = x, g = g)
#> $statistic
#>  [1]  0.004667386 -0.600050691 -0.524350283 -0.158893662 -1.482498596
#>  [6] -0.461348953 -0.034249081 -0.471759890  0.923177678 -1.557724376
#> 
#> $significance
#>  [1] 0.9963273 0.5559521 0.6064326 0.8755219 0.1555043 0.6500745 0.9730555
#>  [8] 0.6427664 0.3681249 0.1367066

Unequal Variance T-Test

The Unequal Variance T-Test, also known as Welch’s T-Test, is employed when the assumption of equal variances is not met. This test is robust in scenarios where variances in the two groups differ.

The Unequal Variance T-Test is available in featscreen through the ?rowUnequalVarT function.

From the documentation, we can see that this is a wrapper function that internally calls ?row_t_welch from the matrixTests package.

#Data
x = cbind(
  matrix(rnorm(n = 10 * 10, mean = 1, sd = 2), 10, 10),
  matrix(rnorm(n = 10 * 10, mean = 3, sd = 4), 10, 10)
)
# Assuming 'group1' and 'group2' are indicating independent measurements
g = c(rep('group1',10),rep('group2',10))

#Compute
rowUnequalVarT(x = x, g = g)
#> $statistic
#>  [1] -2.2405783 -2.2538213  0.1452079  0.9806682 -2.9989363 -0.8862154
#>  [7] -3.6757116 -1.3743303 -2.4372740 -6.4725307
#> 
#> $significance
#>  [1] 4.587359e-02 4.293420e-02 8.870308e-01 3.473898e-01 1.174654e-02
#>  [6] 3.894469e-01 3.692485e-03 1.946094e-01 2.910688e-02 4.101379e-05

Paired Samples

The paired t-test variant is available for situations where measurements are paired, such as repeated measurements on the same subjects or matched pairs of subjects. This variant of the t-test assesses whether the mean difference between paired observations is statistically different from zero.

The paired t-test is particularly useful in experimental designs where each subject is subjected to two different conditions, treatments, or time points, and the goal is to determine whether there is a significant change within each pair. For instance, in clinical trials, the paired t-test may be applied to assess the efficacy of a treatment by comparing measurements taken before and after the treatment for the same group of individuals.

In featscreen, a paired t-test is computed by calling the function ?rowPairedT. From the documentation, we can see that this is a wrapper function that internally calls ?row_t_paired from the matrixTests package.

Let see an example.

#Data
x = matrix(rnorm(10 * 20), 10, 20)
# Assuming 'before' and 'after' are indicating paired measurements
g = c(rep('before',10),rep('after',10))

#Compute
rowPairedT(x = x, g = g)
#> $statistic
#>  [1] -0.70249807  0.24745281 -0.37887763 -0.26889905 -2.09181818  0.61294897
#>  [7]  0.04841629 -0.04692657 -0.34187150 -0.31344754
#> 
#> $significance
#>  [1] 0.50013315 0.81011060 0.71356227 0.79407000 0.06598845 0.55508069
#>  [7] 0.96244188 0.96359655 0.74029490 0.76108331

In this example, the paired t-test is applied to assess whether there is a significant difference in measurements taken before and after a treatment or intervention. Users can adapt this approach to various domains, providing a powerful means of exploring paired data within the supervised feature screening process.

Two-Sample Wilcoxon’s Signed-Rank Test

The Two-Sample Wilcoxon’s Signed-Rank Test is a non-parametric alternative to the t-test, suitable for situations where the assumption of normality is violated. This test assesses whether the distribution of differences between paired observations differs significantly from zero. It is particularly robust against outliers and variations in the shape of the distribution, making it a valuable option for scenarios with non-normally distributed data.

In featscreen, the wilcoxon signed-rank test is computed by calling the function ?rowPairedWilcoxonT.

From the documentation, we can see that this is a wrapper function that internally calls ?row_wilcoxon_paired from the matrixTests package.

#Data
x = matrix(rnorm(10 * 20), 10, 20)
# Assuming 'before' and 'after' are indicating paired measurements
g = c(rep('before',10),rep('after',10))

#Compute
rowWilcoxonT(x = x, g = g)
#> $statistic
#>  [1] 82 54 74 65 45 45 50 30 51 61
#> 
#> $significance
#>  [1] 0.01468964 0.79593626 0.07525601 0.27986101 0.73936435 0.73936435
#>  [7] 1.00000000 0.14314014 0.97051246 0.43587218

Two-Sample Mann-Whitney U-Test

The Two-Sample Mann-Whitney U-Test (also known as Wilcoxon Rank-Sum Test) extends the Wilcoxon test to independent samples, allowing analysts to evaluate whether the distributions of two groups differ significantly. This non-parametric test is suitable for situations where assumptions of normality are not met, providing an effective tool for feature screening in diverse datasets. The Mann-Whitney U-Test results in a U-statistic and a p-value, aiding in the determination of statistical significance between groups.

In featscreen, the Wilcoxon rank-sum test is computed by calling the function ?rowWilcoxonT.

From the documentation, we can see that this is a wrapper function that internally calls ?row_wilcoxon_twosample from the matrixTests package.

#Data
x = matrix(rnorm(10 * 20), 10, 20)
# Assuming 'before' and 'after' are indicating independent measurements
g = c(rep('before',10),rep('after',10))

#Compute
rowWilcoxonT(x = x, g = g)
#> $statistic
#>  [1] 43 60 56 55 15 54 59 67 51 47
#> 
#> $significance
#>  [1] 0.630528914 0.481250947 0.684210526 0.739364351 0.006841456 0.795936262
#>  [7] 0.528848860 0.217562623 0.970512460 0.853428305

Comparing Multiple Groups

Comparing multiple groups is a crucial aspect of supervised feature screening, allowing analysts to discern variations in feature behavior across different categorical or experimental conditions.

featscreen integrates diverse statistical methods tailored for such comparisons, accommodating scenarios with varying assumptions and data characteristics. Whether exploring mean differences with ANOVA, assessing median disparities with non-parametric tests, or examining categorical associations with the Chi-Square Test, featscreen provides a comprehensive suite of tools to unravel nuanced patterns within the feature space.

One-way Analysis of Variance F-Test

The One-way Analysis of Variance (ANOVA) F-Test stands as a fundamental parametric method designed for comparing means across two or more independent groups. This statistical test addresses the question of whether there are significant differences in the means of distinct groups, making it a versatile tool for analyzing experimental data with multiple categorical conditions or levels. ANOVA relies on the assumption of normality and homogeneity of variances among the groups.

The F-Test results in an F-statistic and a p-value, enabling analysts to make informed decisions about the statistical significance of observed differences in means.

featscreen supports ANOVA with equal or unequal variance.

Equal Variance ANOVA

The Equal Variance ANOVA is used to assess mean differences among multiple independent groups. This variant of ANOVA, also known as the traditional or pooled ANOVA, assumes homogeneity of variances and normality within each group.

In featscreen, the pooled one-way analysis of variance test is computed by calling the function ?rowEqualVarOneWayAnova.

From the documentation, we can see that this is a wrapper function that internally calls ?row_oneway_equalvar from the matrixTests package.

#Data
x = matrix(rnorm(10 * 20), 10, 20)
# Assuming 'before' and 'after' are indicating independent measurements
g = c(rep('before',10),rep('after',10))

#Compute
rowEqualVarOneWayAnova(x = x, g = g)
#> $statistic
#>  [1] 10.19316848  0.03225024  0.04230048  0.06552166  2.00635238  0.59021632
#>  [7]  0.01912477  0.42145581  0.14022777  0.11147536
#> 
#> $significance
#>  [1] 0.005043041 0.859485661 0.839356234 0.800874877 0.173715440 0.452297572
#>  [7] 0.891544743 0.524409445 0.712429816 0.742328966

Unequal Variance ANOVA

The Unequal Variance ANOVA, also known as Welch’s ANOVA, accommodates scenarios where the variances among groups differ, offering a robust approach when homogeneity of variances cannot be assumed.

In featscreen, the one-way analysis of variance test with Welch correction is computed by calling the function ?rowUnequalVarOneWayAnova.

From the documentation, we can see that this is a wrapper function that internally calls ?row_oneway_welch from the matrixTests package.

#Data
x = cbind(
  matrix(rnorm(n = 10 * 10, mean = 1, sd = 2), 10, 10),
  matrix(rnorm(n = 10 * 10, mean = 3, sd = 4), 10, 10)
)
# Assuming 'before' and 'after' are indicating independent measurements
g = c(rep('before',10),rep('after',10))

#Compute
rowUnequalVarOneWayAnova(x = x, g = g)
#> $statistic
#>  [1] 0.006616130 4.551030128 3.854863607 0.708377597 1.118078432 0.224635260
#>  [7] 1.821935578 0.001472292 0.821963664 0.015927117
#> 
#> $significance
#>  [1] 0.93674061 0.05482032 0.06530365 0.41674366 0.31000755 0.64494599
#>  [7] 0.19982402 0.96992567 0.38033507 0.90118277

Kruskal-Wallis Test

A non-parametric alternative to ANOVA, the Kruskal-Wallis test is applied when the assumption of normality is not met or when analyzing ordinal data.

This test evaluates whether the distributions of two or more independent groups differ significantly. The Kruskal-Wallis Test results in a chi-square statistic and a p-value, offering a robust approach for scenarios where parametric assumptions may be violated.

In featscreen, the Kruskal-Wallis H-test is computed by calling the function ?rowKruskalWallis.

From the documentation, we can see that this is a wrapper function that internally calls ?row_kruskalwallis from the matrixTests package.

#Data
x = cbind(
  matrix(rnorm(n = 10 * 10, mean = 1, sd = 2), 10, 10),
  matrix(rnorm(n = 10 * 10, mean = 3, sd = 4), 10, 10)
)
# Assuming 'before' and 'after' are indicating independent measurements
g = c(rep('before',10),rep('after',10))

#Compute
rowUnequalVarOneWayAnova(x = x, g = g)
#> $statistic
#>  [1]  1.071701  3.756477  1.446872  1.520712  0.582938  2.292538  1.062549
#>  [8]  5.109783  6.530679 16.376850
#> 
#> $significance
#>  [1] 0.325304467 0.073453691 0.253354235 0.239221847 0.456458177 0.160090799
#>  [7] 0.320451489 0.041018406 0.027682348 0.001356772

Chi-Square Test

Chi-Square Test investigates associations between categorical variables, providing a statistical basis for feature selection in contingency tables.

In featscreen, the Pearson’s Chi-squared test is computed by calling the function ?rowPearsonChiSq.

From the documentation, we can see that this is a wrapper function that internally calls ?chisq.test from the stats package.

#Data
## Matrix with 2 features: 'mutation status' and 'sex'
x = rbind(
   matrix(sample(c("mut", "wt"),30,TRUE), 1, 30),
   matrix(sample(c("m", "f")   ,30,TRUE), 1, 30)
)
g = sample(c("a","b","c"), 30, replace = TRUE)

#Compute
rowPearsonChiSq(x = x, g = g, simulate.p.value = TRUE)
#> $statistic
#> [1] 1.574598 1.715343
#> 
#> $significance
#> [1] 0.5192404 0.4582709

The example showcases the application of the Chi-Square Test using a contingency table with two categorical features: ‘mutation status’ and ‘sex’. The test results include a Chi-Square statistic and a p-value, providing analysts with insights into the significance of associations between these categorical variables across different groups.

Survival Data

Survival data analysis occupies a distinctive realm within feature screening, focusing on time-to-event outcomes. In many fields, including medical research and epidemiology, understanding the duration until a particular event occurs is crucial.

Cox PH Regression Coefficient z-test

Survival analysis often employs Cox Proportional-Hazard (PH) models, a powerful statistical approach for understanding the impact of covariates on the time to an event.

In featscreen, the Cox PH Regression Coefficient Z-Test is used for assessing the significance of regression coefficients in the context of survival data.

Within featscreen, the Cox Proportional-Hazard regression coefficient Z-Test is executed through the ?rowCoxPH function. From the documentation, we can see that this this function serves as a wrapper, internally calling ?coxph from the survival package.

#Data
x = matrix(rnorm(10 * 7), 10, 7)
y = data.frame(
   time = c(4,3,1,1,2,2,3),
   status = c(1,1,1,0,1,1,0)
)

#Compute
rowCoxPH(x = x, y = y)
#> $statistic
#>  [1] 4.6493152 0.8127091 1.5519850 2.2382765 0.6912241 0.4545558 2.7561108
#>  [8] 1.4729205 0.4579275 1.4901095
#> 
#> $significance
#>  [1] 0.05461149 0.77957911 0.44449927 0.16312546 0.46587119 0.17767696
#>  [7] 0.11668917 0.53995856 0.29448808 0.47456875

The example above showcases the application of the Cox PH Regression Coefficient Z-Test in featscreen, using simulated survival data. The results include z-statistics and p-values, aiding analysts in discerning the significance of individual features in influencing time-to-event outcomes.

Bio-statistical Functions

In the pursuit of understanding complex biological data, there have been continuous efforts to refine statistical techniques, aiming to better model the intricacies inherent in bioinformatic datasets.

These bio-statistical functions can play a crucial role in feature screening.

Empirical Bayes Moderated Tests

These tests offer a robust approach to detecting differentially expressed features in gene expression data arising from microarray or RNA-seq technologies.

Particularly, here we take advantage of the limma R package. Renowned for its effectiveness in gene expression analysis, limma employs linear modeling to fit a model to the systematic part of the data, paving the way for robust statistical inference.

The basic statistic used for significance analysis is the moderated t-statistic, which has the same interpretation as an ordinary t-statistic except that the standard errors have been moderated across genes, i.e., squeezed towards a common value, using a simple Bayesian model. As reported in the limma user guide:

Moderated t-statistics lead to p-values in the same way that ordinary t-statistics do except that the degrees of freedom are increased, reflecting the greater reliability associated with the smoothed standard errors. The effectiveness of the moderated t approach has been demonstrated on test data sets for which the differential expression status of each probe is known.

In featscreen, the empirical Bayes moderated t-statistics are computed by calling the function ?rowEBayesStatistics.

From the documentation, we can see that this is a wrapper function that internally calls different functions from the limma package:

voom: (optional) transform count data for linear modelling.
lmFit: linear model fitting for each feature.
contrasts.fit: compute coefficients and standard errors for a given set of contrasts.
eBayes: compute moderated moderated t-/F-statistic.
topTable: summarise the linear model fit.

#Define row/col size
nr = 20
nc = 20

#Data
x = matrix(
  data = stats::rnorm(n = nr*nc),
  nrow = nr,
  ncol = nc,
  dimnames = list(
    paste0("g",seq(nr)),
    paste0("S",seq(nc))
  )
)
#Categorical output vector (multinomial)
y = sample(x = c("I","II","III"),size=nc,replace=TRUE)
names(y) = paste0("S",seq(nc))

#Compute
rowEBayesStatistics(x=x,y=y)
#> $statistic
#>  [1] 1.0137223 0.4925825 1.9615795 0.3904325 0.5573581 3.5581814 1.5763334
#>  [8] 0.4406250 0.2684814 1.2074018 0.4657751 0.7166232 0.2134995 0.3921002
#> [15] 1.4700849 0.8278437 0.5124011 1.1684151 0.6404147 0.5607559
#> 
#> $significance
#>  [1] 0.38532148 0.68741566 0.11735461 0.75989623 0.64316005 0.01362243
#>  [7] 0.19275114 0.72394529 0.84816445 0.30525490 0.70616183 0.54188990
#> [13] 0.88710173 0.75869384 0.22043643 0.47827473 0.67371081 0.32008245
#> [19] 0.58891259 0.64088326

SAM Permutation Test

Significance Analysis of Microarrays (SAM) permutation test is a powerful tool for identifying features with significant expression changes in microarray data. It was proposed by Tusher, Tibshirani and Chu (Tusher, Tibshirani, and Chu 2001). Here we take advantage of the samr R package.

As reported from the SAM user guide:

The input to SAM is gene expression measurements from a set of microarray experiments, as well as a response variable from each experiment. The response variable may be a grouping like untreated, treated (either unpaired or paired), a multiclass grouping (like breast cancer, lymphoma, colon cancer), a quantitative variable (like blood pressure) or a possibly censored survival time. SAM computes a statistic d_i for each gene i, measuring the strength of the relationship between gene expression and the response variable. It uses repeated permutations of the data to determine if the expression of any genes are significantly related to the response. The cutoff for significance is determined by a tuning parameter delta, chosen by the user based on the false positive rate. One can also choose a fold change parameter, to ensure that called genes change at least a pre-specified amount.

In featscreen, the SAM permutation test is computed by calling the function ?rowSamStatistics.

From the documentation, we can see that this is a wrapper function that internally calls three functions from the samr package:

samr: correlate each feature with outcome variable.
samr.compute.delta.table: compute thresholds, cutpoints, and false discovery rates for SAM analysis.
samr.compute.siggenes.table: compute SAM statistics and significance.

#Define row/col size
nr = 20
nc = 20

#Data
x = matrix(
  data = stats::rnorm(n = nr*nc),
  nrow = nr,
  ncol = nc,
  dimnames = list(
    paste0("g",seq(nr)),
    paste0("S",seq(nc))
  )
)
#Categorical output vector (multinomial)
y = sample(x = c(1,2,3),size=nc,replace=TRUE)
names(y) = paste0("S",seq(nc))

#Compute
rowSamStatistics(x=x,y=y)
#> $statistic
#>  [1]  0.421  0.026 -0.085  0.533 -0.766 -1.037  0.341 -0.169  0.966  0.322
#> [11]  0.115 -0.227  0.005 -0.021 -0.760 -0.087  0.533 -0.045 -0.344  0.406
#> 
#> $significance
#>  [1] 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8
#> [20] 0.8

References

Tusher, Virginia Goss, Robert Tibshirani, and Gilbert Chu. 2001. “Significance analysis of microarrays applied to the ionizing radiation response.” Proceedings of the National Academy of Sciences of the United States of America 98 (9): 5116–21. https://doi.org/10.1073/pnas.091062498.