Balanced Random Sample Without Replacement
Source:R/1-sampling-functions.R
balancedSampleWithoutReplacement.Rd
Takes a balanced sample without replacement from the population. See the Details section below for further information.
Arguments
- strata
vector of stratification variables. The population size is
length(strata)
- n
positive integer value, the sample size
- prob
(optional) vector of positive numeric values, the probability weights for obtaining the
strata
elements. If provided, it must be the same length asstrata
Details
When the number of elements per stratum (given by the sample size n
divided by the number of groups in strata
) is less than the number of
elements in the minority group in strata
,
this function implements the so-called "random undersampling", in which
the proportion of the strata in the population is adjusted in the taken sample
by removing elements from the majority stratum, so that each group is balanced.
When the number of elements per stratum is greater than the number of
elements in the minority group in strata
, the function raises an error.
References
He and Garcia, Learning from Imbalanced Data, IEEE Transactions on Knowledge and Data Engineering (2009)
Examples
#Set seed for reproducibility
set.seed(seed = 5381L)
#Define strata
strata = c(rep("a", 3),rep("b", 6))
#Check ratio
table(strata)/length(strata)
#> strata
#> a b
#> 0.3333333 0.6666667
#Balanced random sample
i = balancedSampleWithoutReplacement(
strata = strata,
n = 6
)
#Check indices
i
#> [1] 5 1 7 2 3 6
#Check ratio in the sample
table(strata[i])/length(strata[i])
#>
#> a b
#> 0.5 0.5