Takes repeated balanced samples from the population. See the Details section below for further information.
Arguments
- k
number of folds
- strata
vector of stratification variables. The population size is
length(strata)
- undersample
logical, whether to remove elements from the population in order to try to obtain balanced folds
- prob
(optional) vector of positive numeric values, the probability weights for obtaining the
strata
elements. If provided, it must be the same length asstrata
Details
Each element in the population is randomly assigned to one of the k
folds so that the percentage of each stratum in the population is balanced
in each fold (see balancedKFolds
function for further details).
A list of length k is then created from these folds, so that the i-th
item of the list is a vector of indices generated by removing the i-th fold
and merging the remaining k - 1 folds together.
Examples
#Set seed for reproducibility
set.seed(seed = 5381L)
#Define balanced strata
strata = c(rep(1,6),rep(2,6))
#Check ratio
table(strata)/length(strata)
#> strata
#> 1 2
#> 0.5 0.5
#Assign data to 3 folds
i = repeatedBalancedKm1Folds(
strata = strata,
k = 3
)
#Check indices
i
#> [[1]]
#> [1] 1 4 5 6 7 8 10 11
#>
#> [[2]]
#> [1] 2 3 4 6 8 9 11 12
#>
#> [[3]]
#> [1] 1 2 3 5 7 9 10 12
#>
#Check ratio in the samples made of k-1 folds
table(strata[i[[1]]])/length(strata[i[[1]]])
#>
#> 1 2
#> 0.5 0.5
table(strata[i[[2]]])/length(strata[i[[2]]])
#>
#> 1 2
#> 0.5 0.5
table(strata[i[[3]]])/length(strata[i[[3]]])
#>
#> 1 2
#> 0.5 0.5
#Define unbalanced strata
strata = c(rep(1,6),rep(2,12))
#Check ratio
table(strata)/length(strata)
#> strata
#> 1 2
#> 0.3333333 0.6666667
#Assign data to 3 folds
i = repeatedBalancedKm1Folds(
strata = strata,
k = 3,
undersample = TRUE
)
#Check folds
i
#> [[1]]
#> [1] 1 2 3 4 9 13 16 18
#>
#> [[2]]
#> [1] 2 3 5 6 9 12 15 16
#>
#> [[3]]
#> [1] 1 4 5 6 12 13 15 18
#>
#> attr(,"removed.data")
#> [1] 7 8 10 11 14 17
#Check ratio in the samples made of k-1 folds
table(strata[i[[1]]])/length(strata[i[[1]]])
#>
#> 1 2
#> 0.5 0.5
table(strata[i[[2]]])/length(strata[i[[2]]])
#>
#> 1 2
#> 0.5 0.5
table(strata[i[[3]]])/length(strata[i[[3]]])
#>
#> 1 2
#> 0.5 0.5