Skip to contents

Takes a balanced sample with replacement from the population. See the Details section below for further information.

Usage

balancedSampleWithReplacement(strata, n, prob = NULL)

Arguments

strata

vector of stratification variables. The population size is length(strata)

n

positive integer value, the sample size

prob

(optional) vector of positive numeric values, the probability weights for obtaining the strata elements. If provided, it must be the same length as strata

Value

A vector of length n containing the index of the computed random set of observations.

Details

This function works when the number of elements per stratum (given by the sample size n divided by the number of groups in strata) is less/greater than the number of elements in the minority group in strata, by taking independent samples with replacement from each group.

References

He and Garcia, Learning from Imbalanced Data, IEEE Transactions on Knowledge and Data Engineering (2009)

Author

Alessandro Barberis

Examples

#Set seed for reproducibility
set.seed(seed = 5381L)

#Define strata
strata = c(rep("a", 3),rep("b", 6))

#Check ratio
table(strata)/length(strata)
#> strata
#>         a         b 
#> 0.3333333 0.6666667 

#Balanced random sample with replacement
i = balancedSampleWithReplacement(
  strata = strata,
  n = 8
)
#Check indices
i
#> [1] 1 3 4 2 1 6 7 7
#Check ratio in the sample
table(strata[i])/length(strata[i])
#> 
#>   a   b 
#> 0.5 0.5