Skip to contents

Takes repeated balanced samples from the population. See the Details section below for further information.

Usage

repeatedBalancedKm1Folds(k, strata, undersample = FALSE, prob = NULL)

Arguments

k

number of folds

strata

vector of stratification variables. The population size is length(strata)

undersample

logical, whether to remove elements from the population in order to try to obtain balanced folds

prob

(optional) vector of positive numeric values, the probability weights for obtaining the strata elements. If provided, it must be the same length as strata

Value

A list of length k where each element is a vector containing the indices of the sampled data.

Details

Each element in the population is randomly assigned to one of the k folds so that the percentage of each stratum in the population is balanced in each fold (see balancedKFolds function for further details). A list of length k is then created from these folds, so that the i-th item of the list is a vector of indices generated by removing the i-th fold and merging the remaining k - 1 folds together.

See also

Author

Alessandro Barberis

Examples

#Set seed for reproducibility
set.seed(seed = 5381L)

#Define balanced strata
strata = c(rep(1,6),rep(2,6))

#Check ratio
table(strata)/length(strata)
#> strata
#>   1   2 
#> 0.5 0.5 

#Assign data to 3 folds
i = repeatedBalancedKm1Folds(
 strata = strata,
 k = 3
)
#Check indices
i
#> [[1]]
#> [1]  1  4  5  6  7  8 10 11
#> 
#> [[2]]
#> [1]  2  3  4  6  8  9 11 12
#> 
#> [[3]]
#> [1]  1  2  3  5  7  9 10 12
#> 
#Check ratio in the samples made of k-1 folds
table(strata[i[[1]]])/length(strata[i[[1]]])
#> 
#>   1   2 
#> 0.5 0.5 
table(strata[i[[2]]])/length(strata[i[[2]]])
#> 
#>   1   2 
#> 0.5 0.5 
table(strata[i[[3]]])/length(strata[i[[3]]])
#> 
#>   1   2 
#> 0.5 0.5 

#Define unbalanced strata
strata = c(rep(1,6),rep(2,12))

#Check ratio
table(strata)/length(strata)
#> strata
#>         1         2 
#> 0.3333333 0.6666667 

#Assign data to 3 folds
i = repeatedBalancedKm1Folds(
 strata = strata,
 k = 3,
 undersample = TRUE
)
#Check folds
i
#> [[1]]
#> [1]  1  2  3  4  9 13 16 18
#> 
#> [[2]]
#> [1]  2  3  5  6  9 12 15 16
#> 
#> [[3]]
#> [1]  1  4  5  6 12 13 15 18
#> 
#> attr(,"removed.data")
#> [1]  7  8 10 11 14 17
#Check ratio in the samples made of k-1 folds
table(strata[i[[1]]])/length(strata[i[[1]]])
#> 
#>   1   2 
#> 0.5 0.5 
table(strata[i[[2]]])/length(strata[i[[2]]])
#> 
#>   1   2 
#> 0.5 0.5 
table(strata[i[[3]]])/length(strata[i[[3]]])
#> 
#>   1   2 
#> 0.5 0.5