Skip to contents

Introduction

In statistics, sampling is the selection of a subset of elements from a population, here defined as a complete set of subjects of interest.

Since often it is too expensive or logistically impossible to collect data for every case in a population, sampling is instead used as a cheap and fast methodology to estimate its characteristics.

Different sampling schemes exist, but they can be grouped into 2 main categories, i.e. sampling with or without replacement:

  • sampling with replacement implies each element in the population may appear multiple times in one sample
  • in sampling without replacement, each member of the population can be chosen only once in one sample

In this article, we show how to draw samples with replacement from a population by using the functions implemented in resampling.

For further information on how to draw repeated samples with replacement, see Resampling with replacement.

Setup

Loading

Firstly, we need to load the resampling R package:

Seed

Then, we set a seed for the random number generation (RNG). In fact, different R sessions have different seeds created from current time and process ID by default, and consequently different simulation results. By fixing a seed we ensure we will be able to reproduce the results of this vignette. We can specify a seed by calling ?set.seed.

#Set a seed for RNG
set.seed(
  #A seed
  seed = 5381L,                   #a randomly chosen integer value
  #The kind of RNG to use
  kind = "Mersenne-Twister",      #we make explicit the current R default value
  #The kind of Normal generation
  normal.kind = "Inversion"       #we make explicit the current R default value
)

Sampling With Replacement

As previously wrote, sampling with replacement implies that elements of a population can be chosen multiple times in one sample. There are different techniques of sampling with replacement, including:

  • simple random sampling
  • random sampling with unequal probabilities
  • stratified sampling
  • balanced sampling (a special case of stratified sampling)

The available methods can be listed through the ?listAvailableSamplingMethods function call, setting the input argument to 'rswr'. ?listAvailableSamplingMethods returns a table with two columns:

  • id: the id of the sampling method, to be used in the function calls
  • name: the name of the sampling method
#list sampling methods
sampling.methods = listAvailableSamplingMethods(x = 'rswr')

#print in table
knitr::kable(x = sampling.methods)
id name
rswr random sampling with replacement
srswr simple random sampling with replacement
stratified_rswr stratified random sampling with replacement
balanced_rswr balanced random sampling with replacement
bootstrap ordinary bootstrap sampling

The name of the sampling functions can be retrieved by calling ?listSamplingFunctionNames.

#list sampling function names
sampling.function.names = listSamplingFunctionNames(x = 'rswr')

#print in table
knitr::kable(x = sampling.function.names)
id name
rswr sampleWithReplacement
srswr simpleRandomSampleWithReplacement
stratified_rswr stratifiedSampleWithReplacement
balanced_rswr balancedSampleWithReplacement
bootstrap bootstrapSample

Each function is documented. To learn more about a specific method it is possible to use the ? operator. For example, let’s check the function ?simpleRandomSampleWithReplacement.

#See documentation
?simpleRandomSampleWithReplacement

From the documentation, we can see that the function accepts 2 arguments in input:

  • N: the population size
  • n: the sample size

Simple Random Sampling

Simple random sampling (SRS) is the easiest form of sampling with replacement. In SRS with replacement, each element of the population has the same probability of being selected for the sample.

#Simple random sampling with replacement
simpleRandomSampleWithReplacement(
  N = 10,
  n = 8
)
#> [1]  1  9 10  3  4  1  9  8

Random Sampling With Unequal Probability

The concept of random sampling with unequal probability was perhaps introduced by Hansen and Hurwitz in the context of sampling with replacement (Hansen and Hurwitz 1943). Under this sampling design, elements of the population have different probabilities of being selected. We can use ?sampleWithReplacement to draw our sample. For example, let’s assume our population of interest has 10 elements, and that the first 3 elements have an higher chance of being selected.

#Random sampling with replacement
sampleWithReplacement(
  N = 10,
  n = 8,
  prob = c(rep(3,3), rep(1,7))
)
#> [1] 6 3 4 2 9 1 2 1

Stratified Random Sampling

When a population can be partitioned into groups (i.e. strata or subpopulations) having certain properties in common, a stratified sampling approach can be used. This sampling design is adopted to ensure that subgroups of the population are represented in the taken sample.

A stratified sample with replacement can be taken by using ?stratifiedSampleWithReplacement which accept a strata argument in input.

#Define strata
strata = c(rep("a", 3),rep("b", 6))

#Stratified sampling with replacement
stratifiedSampleWithReplacement(
  strata = strata,
  n = 9
)
#> [1] 5 2 7 8 8 3 3 8 5

?stratifiedSampleWithReplacement implements the so-called “proportionate allocation”, in which the proportion of the strata in the population is maintained in the samples.

#Define strata
strata = c(rep("a", 3),rep("b", 6))

#Check ratio
table(strata)/length(strata)
#> strata
#>         a         b 
#> 0.3333333 0.6666667

#Stratified sampling with replacement
s = stratifiedSampleWithReplacement(
  strata = strata,
  n = 9
)

#Check ratio in the sample
table(strata[s])/length(strata[s])
#> 
#>         a         b 
#> 0.3333333 0.6666667

Balanced Random Sampling

Balanced sampling is a special case of stratified sampling used to ensure that subgroups of the population are equally represented in the taken sample.

#Define strata
strata = c(rep("a", 3),rep("b", 6))

#Check ratio
table(strata)/length(strata)
#> strata
#>         a         b 
#> 0.3333333 0.6666667

#Balanced sampling with replacement
s = balancedSampleWithReplacement(
  strata = strata,
  n = 8
)

#Check ratio in the sample
table(strata[s])/length(strata[s])
#> 
#>   a   b 
#> 0.5 0.5

Bootstrap Sampling

Inspired by earlier work on the jackknife, Bradley Efron published the bootstrap method in 1979 (Efron 1979). An ordinary bootstrap sample is a special case of random sample with replacement where the sample size is equivalent to the population size.

A bootstrap sample can be taken by using ?bootstrapSample:

#Bootstrap sampling
bootstrapSample(
  N = 10
)
#>  [1] 10  3  3  4  3  9  8  2  2  5

References

Efron, Bradley. 1979. Bootstrap Methods: Another Look at the Jackknife.” The Annals of Statistics 7 (1): 1–26. https://www.jstor.org/stable/2958830.
Hansen, Morris H., and William N. Hurwitz. 1943. On the Theory of Sampling from Finite Populations.” The Annals of Mathematical Statistics 14 (4): 333–62.