Assigns the population to k balanced folds and returns a random sample made of elements from k-1 folds. See the Details section below for further information.
Arguments
- k
number of folds
- strata
vector of stratification variables. The population size is
length(strata)
- undersample
logical, whether to remove elements from the population in order to try to obtain balanced folds
- prob
(optional) vector of positive numeric values, the probability weights for obtaining the
strata
elements. If provided, it must be the same length asstrata
- i
(optional) integer, fold to be use as holdout data
Details
Each element in the population is randomly assigned to one of the k
folds so that the percentage of each stratum in the population is balanced
in each fold (see balancedKFolds
function for further details).
If provided, i
indicates the i-th fold to be considered as holdout data.
If i
is missing, one fold is randomly selected to be the holdout data.
A random sample is then generated by removing the i-th fold and merging the
remaining k - 1 folds together.
Examples
#Set seed for reproducibility
set.seed(seed = 5381L)
#Define balanced strata
strata = c(rep(1,6),rep(2,6))
#Check ratio
table(strata)/length(strata)
#> strata
#> 1 2
#> 0.5 0.5
#Assign data to 3 folds
i = balancedKm1Folds(
strata = strata,
k = 3
)
#Check indices
i
#> [1] 2 3 4 6 8 9 11 12
#Check ratio in the samples made of k-1 folds
table(strata[i])/length(strata[i])
#>
#> 1 2
#> 0.5 0.5
#Define unbalanced strata
strata = c(rep(1,6),rep(2,12))
#Check ratio
table(strata)/length(strata)
#> strata
#> 1 2
#> 0.3333333 0.6666667
#Assign data to 3 folds
i = balancedKm1Folds(
strata = strata,
k = 3,
undersample = TRUE
)
#Check folds
i
#> [1] 1 3 4 6 9 13 16 18
#Check ratio in the samples made of k-1 folds
table(strata[i])/length(strata[i])
#>
#> 1 2
#> 0.5 0.5