Simulating and plotting data — an exercise

After our first brief introduction to the R environment, objects, and plotting, let’s continue in that vein. The goals of this exercise are to introduce you to the simulation and visualization of data.

R has many different ways to generate distributions of data. We will keep it simple and stick to the simulation of normally distributed data. Using the ‘rnorm()’ function, all we need to do is specify the mean and standard deviation of our distribution to simulate some data. Let’s generate 200 observations with a mean of 43 and SD of 4.

set1 <- rnorm(n=200, mean=43, sd=4) #generating two hundred observations with a mean of 43 and standard deviation of 4

The next step is to visually explore this distribution, which can be done with a histogram. As usual, there are different ways to go about it (the ggplot package offers some very nice options) but we will stick to the base version.

hist(set1, breaks=12, xlim=c(0,100), main="Example Histogram [normal distribution]", xlab="Trait", col="lightblue", border="steelblue") #plotting a frequency distribution of the observations

An alternative way to illustrate the data would be a boxplot. These plots are useful for describing variation in your samples, and can easily visualize how widely spread the data are or if the data are skewed in a certain direction. The thick line is the median, with the box containing 50% of the observations (1st and 3rd quartile).

boxplot(set1, ylim=c(0,100), main="Example Boxplot [normal distribution]", xlab="Trait", col="lightblue", border="steelblue", horizontal=TRUE) #summarizing the distribution with a boxplot

We can do the same with different settings. Let’s take a look at how that would look like. When modifying the mean and SD we should see a shift of the distribution and a different spread, or dispersion of observations. Vary the numbers to get a feel for it.

set2 <- rnorm(n=200, mean=57, sd=12) #simulating a skewed distribution to emphasize difference to a normal distribution

par(mfrow=c(2,2)) #this creates 4 panels (2x2) for plotting

hist(set1, breaks=12, xlim=c(0,100), main="Example Histogram [Trait 1]", xlab="Trait 1", col="lightblue", border="steelblue") #this is the histogram of the normal distribution (see above)

hist(set2, breaks=12, xlim=c(0,100), main="Example Histogram [Trait 2]", xlab="Trait 2", col="red2", border="red4") #and this is the histogram of the Poisson distribution

boxplot(set1, ylim=c(0,100), main="Example Boxplot [Trait 1]", xlab="Trait 1", col="lightblue", border="steelblue", horizontal=TRUE) #same as above

boxplot(set2, ylim=c(0,100), main="Example Boxplot [Trait 2]", xlab="Trait 2", col="red2", border="red4", horizontal=TRUE) #and this is the boxplot for the Poisson distribution

Alright, now it’s time for you to do a little exercise. You will need to carefully read the help files and may also have to do some google searching on your own.

1) Generate 2 normal distributions with different means and/or different standard deviations.

2) Plot the frequency distributions of your two simulated datasets in the same diagram (overlapping!), and use transparent colors to enhance readability of the figure. Try to make the figure as aesthetically pleasing as possible.

3) Add a boxplot of your distributions to the same figure (as different panel).

Solutions can be found here.