[ad_1]

## Simple Random Samples With And Without Replacement In Python

Different machine learning models do not perform well if the size of data is very large because of the limitation of the computer memory. To solve this problem we have to pick the part of the data set which represents the whole data set. This process of picking the part of the data set is known as sampling. A sample is a part of the population that is selected with the expectation that it will represent the characteristics of the population. Sampling is the process of selecting a representative sample from a given population. Random sampling with replacement from a population when the unit selected at random is returned to the population before the next unit is selected. Random sampling without replacement from a population when the unit selected at random is not returned to the population before the next unit is selected

If a simple random sample of size n is selected from a finite population of size N, then the number of all possible samples is given below

If sampling is done with-replacement

If sampling is done without-replacement

Where `N`

is the population size and `n`

is sample size

In this article, we will generate all possible simple random samples of size `n`

From a population of size `N`

with and without replacement, then we will calculate sample means and make a frequency distribution and calculate the mean and variance from the sampling distribution and compare them with the population mean and variance according to the central limit theorem

## Get All Samples of Size 2

`p = [1, 2, 3, 4, 5, 6]`

`Population size: N = 6`

`Sample size: n = 2`

`Number of all possible samples = N x N = 6 x 6 = 36`

## Step 1: import libraries

`numpy`

for calculating`mean`

,`variance`

and`standard deviation`

`pandas`

for displaying samples in tabular format`product`

and`combination`

from`itertools`

for drawing samples with replacement and drawing samples without replacement respectively`plt`

from`matplotlib.pyplot`

for visualizing data

## Step 2: declare population

- create a python list of population and assign it to a variable
`pop`

## Step 3: draw all samples of size `N x N`

`product(pop, pop)`

draws all samples of the population`pop`

`list(product(pop, pop))`

convert these samples to a python`list`

of`tuples`

## Step 4: return the size of the list

`len(list(product(pop, pop)))`

returns size of samples list

## Step 5: calculate the mean of all samples

`np.mean(list(product(pop, pop)), axis=1)`

calculate the mean of all samples

## Step 6: Create a data frame

- create a data frame
`df`

- create two columns of
`samples`

and`samples means`

- display the
`df`

## Step 7: Create a frequency distribution

`np.unique(np.mean(list(product(pop, pop)), axis=1), return_counts=True)`

returns two vectors of samples means and frequency- create probability vector from
`f/sum(f)`

and assign it to`p`

## Step 8: Create a frequency distribution of samples means

- create a data frame of the frequency distribution of samples means
`Σ(xp)`

is the mean`Σx²p-(Σxp)²`

is the variance of frequency distribution`sqrt((Σx²p)-(Σxp)²)`

is the standard deviation of the distribution

## Step 9: Calculate population parameters

- calculate and print the
`mean`

,`variance`

and, the`standard deviation`

of samples mean - calculate and print the
`mean`

,`variance`

, and`standard deviation`

of the population

[ad_2]

Source link