# Statistics for BioinformaticsPracticals - Sampling Distributions

## Tutorial

```
source('http://pedagogix-tagc.univ-mrs.fr/courses/statistics_bioinformatics/R-files/config.R')

## Generate a set of random numbers
p <- 100 ## Number of columns in the table
r <- 10000 ## Number of repetitions
x <- matrix(rnorm(r*p), nrow=r, ncol=p)
dim(x) ## Dimensions of the data table
x[1:10, 1:15] ## Display a subset of the matrix (10 rows, 15 columns in the top left of the matrix)

## Check the general distribution of the 10000 samples (altogether)

## Draw an histogram
hist(x, breaks=100)

## Draw an histogram and store the distribution in a variable called x.distrib
x.distrib <- hist(x, breaks=100)
names(x.distrib)
x.distrib\$breaks ## class boundaries

## Plot a frequency polygon
plot(x.distrib\$mids, x.distrib\$density, type="l",col="blue",panel.first=grid(col="black"),ylim=c(0,2), lwd=2,xlab="x", ylab="density")

## Plot the theoretical distribution as a dotted line
lines(x.distrib\$mids, dnorm(x.distrib\$mids),type="l", col="red",lty="dashed",lwd=2)

## Calculate the means for r=10000 samples of size n=3
my.colors <- c("darkgreen", "darkblue", "darkviolet", "violet", "purple", "brown")
i <- 0
for (n in c(3,10,100)) {

i <- i+1

## Select the n first columns of the matrix
sample <- x[,1:n] ## Select n columns of the matrix
print(sample[1:10, ]) ## display the first 10 rows of the sample

## Calculate the mean for each row of the sub-matrix (for each one of the r=10000 samples)
## You can call help(apply) to understand how the function apply() works
sample.means <- apply(sample, 1, mean) ## The second argument (1) indicates that we apply the operation (mean) on each row
## print(sample.means)

## Calculate the distribution of sample means
sample.mean.distrib <- hist(sample.means, breaks=100, plot=F)
##  print(sample.mean.distrib)

## Plot the sample mean distribution over the sample distribution
lines(sample.mean.distrib\$mids, sample.mean.distrib\$density, type="l",col=my.colors[i],lwd=2)

## Plot the theoretical distribution as a dotted line
lines(x.distrib\$mids, dnorm(x.distrib\$mids, sd=1/sqrt(n)),type="l", col="red",lty="dashed",lwd=2)

}

```

## Exercises

1. Adapt the script above to show the sampling distribution of the mean of n numbers drawn from a uniform distribution. Start with small values of n and progressively increase the sample size by a factor of 2 (1, 2, 4, 8, ..., 1024) .