Skip to contents

Assigns the population to k balanced folds and returns a random sample made of elements from k-1 folds. See the Details section below for further information.

Usage

balancedKm1Folds(k, strata, undersample = FALSE, prob = NULL, i = NULL)

Arguments

k

number of folds

strata

vector of stratification variables. The population size is length(strata)

undersample

logical, whether to remove elements from the population in order to try to obtain balanced folds

prob

(optional) vector of positive numeric values, the probability weights for obtaining the strata elements. If provided, it must be the same length as strata

i

(optional) integer, fold to be use as holdout data

Value

A vector containing the indices of the sampled data.

Details

Each element in the population is randomly assigned to one of the k folds so that the percentage of each stratum in the population is balanced in each fold (see balancedKFolds function for further details). If provided, i indicates the i-th fold to be considered as holdout data. If i is missing, one fold is randomly selected to be the holdout data. A random sample is then generated by removing the i-th fold and merging the remaining k - 1 folds together.

See also

Author

Alessandro Barberis

Examples

#Set seed for reproducibility
set.seed(seed = 5381L)

#Define balanced strata
strata = c(rep(1,6),rep(2,6))

#Check ratio
table(strata)/length(strata)
#> strata
#>   1   2 
#> 0.5 0.5 

#Assign data to 3 folds
i = balancedKm1Folds(
 strata = strata,
 k = 3
)
#Check indices
i
#> [1]  2  3  4  6  8  9 11 12
#Check ratio in the samples made of k-1 folds
table(strata[i])/length(strata[i])
#> 
#>   1   2 
#> 0.5 0.5 

#Define unbalanced strata
strata = c(rep(1,6),rep(2,12))

#Check ratio
table(strata)/length(strata)
#> strata
#>         1         2 
#> 0.3333333 0.6666667 

#Assign data to 3 folds
i = balancedKm1Folds(
 strata = strata,
 k = 3,
 undersample = TRUE
)
#Check folds
i
#> [1]  1  3  4  6  9 13 16 18
#Check ratio in the samples made of k-1 folds
table(strata[i])/length(strata[i])
#> 
#>   1   2 
#> 0.5 0.5