Skip to contents

Introduction

Sampling with replacement implies that each element in the population may appear multiple times in one sample. There are different techniques of sampling with replacement, including:

  • simple random sampling
  • random sampling with unequal probabilities
  • stratified sampling
  • balanced sampling (a special case of stratified sampling)

In this article, we show how to draw repeated samples with replacement from a population by using the functions implemented in resampling.

For further information on the sampling techniques, see Sampling with replacement.

Setup

Loading

Firstly, we need to load resampling and other needed R packages:

#resampling
library(resampling)

#Packages for visualisation
require(ComplexHeatmap, quietly = TRUE)
require(grid, quietly = TRUE)
require(RColorBrewer, quietly = TRUE)

Seed

Then, we set a seed for the random number generation (RNG). In fact, different R sessions have different seeds created from current time and process ID by default, and consequently different simulation results. By fixing a seed we ensure we will be able to reproduce the results of this vignette. We can specify a seed by calling ?set.seed.

#Set a seed for RNG
set.seed(
  #A seed
  seed = 5381L,                   #a randomly chosen integer value
  #The kind of RNG to use
  kind = "Mersenne-Twister",      #we make explicit the current R default value
  #The kind of Normal generation
  normal.kind = "Inversion"       #we make explicit the current R default value
)

Resampling With Replacement

The available methods for taking repeated samples with replacement can be listed through the ?listAvailableSamplingMethods function call, setting the input argument to 'rswr'. ?listAvailableSamplingMethods returns a table with two columns:

  • id: the id of the sampling method, to be used in the function calls
  • name: the name of the sampling method
#list sampling methods
sampling.methods = listAvailableSamplingMethods(x = 'rswr')

#print in table
knitr::kable(x = sampling.methods)
id name
rswr random sampling with replacement
srswr simple random sampling with replacement
stratified_rswr stratified random sampling with replacement
balanced_rswr balanced random sampling with replacement
bootstrap ordinary bootstrap sampling

The name of the resampling functions can be retrieved by calling ?listResamplingFunctionNames.

#list resampling function names
resampling.function.names = listResamplingFunctionNames(x = 'rswr')

#print in table
knitr::kable(x = resampling.function.names)
id name
rswr repeatedSampleWithReplacement
srswr repeatedSimpleRandomSampleWithReplacement
stratified_rswr repeatedStratifiedSampleWithReplacement
balanced_rswr repeatedBalancedSampleWithReplacement
bootstrap repeatedBootstrapSample

Each function is documented. To learn more about a specific method it is possible to use the ? operator. For example, let’s check the function ?repeatedSimpleRandomSampleWithReplacement.

#See documentation
?repeatedSimpleRandomSampleWithReplacement

From the documentation, we can see that the function accepts 3 arguments in input:

  • k: the number of repeated samples to generate
  • N: the population size
  • n: the sample size

Simple Random Sampling

In resampling via simple random sampling (SRS), simple random samples with replacement are repeatedly taken from the population.

The function implementing this sampling scheme is ?repeatedSimpleRandomSampleWithReplacement, which accepts 3 arguments:

  • k: the number of repeated samples to generate
  • N: the population size
  • n: the sample size
#Simple random sampling with replacement
repeatedSimpleRandomSampleWithReplacement(
  k = 2,
  N = 10,
  n = 8
)
#> [[1]]
#> [1]  1  9 10  3  4  1  9  8
#> 
#> [[2]]
#> [1] 1 7 6 5 2 3 5 4

Instead of using ?repeatedSimpleRandomSampleWithReplacement, we can take repeated samples by using the ?resample function.

#Simple random sampling with replacement
obj = resample(
  x = 10,
  n = 8,
  k = 2,
  method = 'srswr'
)

#Print
print(obj)
#> 
#> 2 samples taken from a population of 10 elements by using simple random
#> sampling with replacement.
#> 
#>   sampleNumber          sample sampleSize holdoutSize
#> 1            1 2, 2, 8, 9, ...          8           4
#> 2            2 9, 6, 1, 4, ...          8           4

#Samples
getSamples(obj)
#> [[1]]
#> [1]  2  2  8  9  6  7  7 10
#> 
#> [[2]]
#> [1]  9  6  1  4  9  8 10  1

#Holdout Data
getHoldOutSamples(obj)
#> [[1]]
#> [1] 1 3 4 5
#> 
#> [[2]]
#> [1] 2 3 5 7

#Plot
plot(x = obj)

Random Sampling With Unequal Probability

We can use ?repeatedSampleWithReplacement to draw repeated samples without replacement with unequal probability. From the documentation, we can see that the function accepts 4 arguments in input:

  • k: the number of repeated samples to generate
  • N: the population size
  • n: the sample size
  • prob: an optional vector of probabilities for obtaining the population elements

For example, let’s assume our population of interest has 10 elements, and that the first 3 elements have an higher chance of being selected.

#Random sampling with replacement
repeatedSampleWithReplacement(
  k = 2,
  N = 10,
  n = 5,
  prob = c(rep(3,3), rep(1,7))
)
#> [[1]]
#> [1] 4 8 1 4 1
#> 
#> [[2]]
#> [1] 1 1 6 2 5

We can take repeated samples by using the ?resample function and setting method = 'rswr'.

#Random sampling with replacement
obj = resample(
  x = 10,
  n = 5,
  k = 2,
  method = 'rswr',
  prob = c(rep(3,3),rep(1,7))
)

#Print
print(obj)
#> 
#> 2 samples taken from a population of 10 elements by using random
#> sampling with replacement.
#> 
#>   sampleNumber          sample sampleSize holdoutSize
#> 1            1 2, 1, 8, 1, ...          5           6
#> 2            2 1, 1, 3, 8, ...          5           7

#Samples
getSamples(obj)
#> [[1]]
#> [1] 2 1 8 1 4
#> 
#> [[2]]
#> [1] 1 1 3 8 3

#Holdout Data
getHoldOutSamples(obj)
#> [[1]]
#> [1]  3  5  6  7  9 10
#> 
#> [[2]]
#> [1]  2  4  5  6  7  9 10

#Plot
plot(x = obj)

Stratified Random Sampling

Repeated stratified samples with replacement can be taken by using ?repeatedStratifiedSampleWithReplacement which accept a strata argument in input.

#Define strata
strata = c(rep("a", 3),rep("b", 6))

#Stratified sampling without replacement
repeatedStratifiedSampleWithReplacement(
  k = 2,
  strata = strata,
  n = 6
)
#> [[1]]
#> [1] 5 7 1 2 6 6
#> 
#> [[2]]
#> [1] 8 5 1 5 5 1

We can take repeated samples by using the ?resample function and setting method = 'stratified_rswr'.

#Stratified sampling with replacement
obj = resample(
  x = strata,
  n = 6,
  k = 2,
  method = 'stratified_rswr'
)

#Print
print(obj)
#> 
#> 2 samples taken from a population of 9 elements by using stratified
#> random sampling with replacement.
#> 
#>   sampleNumber          sample sampleSize holdoutSize
#> 1            1 1, 6, 9, 1, ...          6           5
#> 2            2 6, 6, 1, 8, ...          6           4

#Samples
getSamples(obj)
#> [[1]]
#> [1] 1 6 9 1 7 9
#> 
#> [[2]]
#> [1] 6 6 1 8 5 3

#Holdout Data
getHoldOutSamples(obj)
#> [[1]]
#> [1] 2 3 4 5 8
#> 
#> [[2]]
#> [1] 2 4 7 9

#Plot
plot(x = obj, strata = strata)

Balanced Random Sampling

Balanced sampling is a special case of stratified sampling used to ensure that subgroups of the population are equally represented in the taken sample.

Repeated balanced samples with replacement can be taken by using ?repeatedBalancedSampleWithReplacement:

#Define strata
strata = c(rep("a", 3),rep("b", 6))

#Check ratio
table(strata)/length(strata)
#> strata
#>         a         b 
#> 0.3333333 0.6666667

#Balanced sampling with replacement
s = repeatedBalancedSampleWithReplacement(
  k = 2,
  strata = strata,
  n = 6
)

#Check ratio in the samples
table(strata[s[[1]]])/length(strata[s[[1]]])
#> 
#>   a   b 
#> 0.5 0.5
table(strata[s[[2]]])/length(strata[s[[2]]])
#> 
#>   a   b 
#> 0.5 0.5

We can take repeated samples by using the ?resample function and setting method = 'balanced_rswr'.

#Balanced sampling with replacement
obj = resample(
  x = strata,
  n = 6,
  k = 2,
  method = 'balanced_rswr'
)

#Print
print(obj)
#> 
#> 2 samples taken from a population of 9 elements by using balanced
#> random sampling with replacement.
#> 
#>   sampleNumber          sample sampleSize holdoutSize
#> 1            1 9, 1, 3, 5, ...          6           3
#> 2            2 2, 9, 2, 4, ...          6           5

#Samples
getSamples(obj)
#> [[1]]
#> [1] 9 1 3 5 6 2
#> 
#> [[2]]
#> [1] 2 9 2 4 4 3

#Holdout Data
getHoldOutSamples(obj)
#> [[1]]
#> [1] 4 7 8
#> 
#> [[2]]
#> [1] 1 5 6 7 8

#Plot
plot(x = obj, strata = strata)

Bootstrap Sampling

We can take repeated bootstrap samples by using ?repeatedBootstrapSample:

#Bootstrap sampling
repeatedBootstrapSample(
  k = 2,
  N = 10
)
#> [[1]]
#>  [1]  9  7 10  9  2  4 10  7 10  3
#> 
#> [[2]]
#>  [1] 5 5 7 5 5 8 6 2 3 3

We can also use the ?resample function by setting method = 'bootstrap'.

#Bootstrap sampling
obj = resample(
  x = 10,
  k = 2,
  method = 'bootstrap'
)

#Print
print(obj)
#> 
#> 2 samples taken from a population of 10 elements by using ordinary
#> bootstrap sampling.
#> 
#>   sampleNumber          sample sampleSize holdoutSize
#> 1            1 9, 5, 4, 5, ...         10           2
#> 2            2 5, 8, 7, 1, ...         10           3

#Samples
getSamples(obj)
#> [[1]]
#>  [1] 9 5 4 5 7 3 5 2 6 8
#> 
#> [[2]]
#>  [1]  5  8  7  1  9 10  8  8  8  6

#Holdout Data
getHoldOutSamples(obj)
#> [[1]]
#> [1]  1 10
#> 
#> [[2]]
#> [1] 2 3 4

#Plot
plot(x = obj)