Visualizing comparative data in a phylogenetic framework

As discussed earlier in the semester, we are normally dealing with two main categories of data: discrete and conntinuous. Discrete data may include, e.g., the presence or absence of a specific trait such as wings, or a particular protein, while continuous data represent measurements of traits, e.g., body mass, body length, leg length, running speed, or brain voume

The visualization of such comparative data is a very important step prior to any analysis, as visual inspections may suggest patterns that may help forming new/refined hypotheses. This tutorial provides a quick overview of some of the basic and most important techniques.

We will learn visualization techniques of comparative data with a real data set which you can download from my website. These are the data used for Angielczyk and Schmitz (2014). You can also download the data over at DRYAD

#load required packages

After loading libraries, let’s get our data.

#getting tree
tree <- read.tree(url(""))
## Phylogenetic tree with 188 tips and 187 internal nodes.
## Tip labels:
##  tip1, tip2, tip3, tip4, tip5, tip6, ...
## Rooted; includes branch lengths.
#getting data <- read.csv(url(""), header=TRUE)
##   taxon    ol   ext  int     geomm       opt    groups
## 1  tip1  4.38  4.55 2.41  3.634970 0.2914396 nocturnal
## 2  tip2  9.66  7.98 6.35  7.881059 0.5230792 nocturnal
## 3  tip3  9.98  8.51 6.29  8.114036 0.4658447 nocturnal
## 4  tip4  8.26  6.78 5.06  6.568307 0.4571843 nocturnal
## 5  tip5  8.88  6.53 5.20  6.705685 0.4663162 nocturnal
## 6  tip6 12.04 11.09 8.30 10.348531 0.5159388 nocturnal

The first column contains taxon names, columns 2 through 4 contain measurements of orbit length (ol), external scleral rin diameter (ext), and internal scleral ring diameter (int). The next column contains the geometric mean of the three previous traits (geomm), while the opt-column contains the optical ratio, which is the ratio of INT^2/EXT*OL. Finally, the groups-column provides information about the diel activity pattern of each species. We therefore have a mix of discrete (groups) and continuous traits (all others).

Whenever working with comparative data and phylogenies, one must make sure that the data extacly match the phylogeny. Not only the names of taxa must be the same, but also their order. A useful function is thus the treedata() function of the geiger-package.

#Let's first specify row names to allow matching of tree and data
rownames( <-$taxon

#Then we will run the comparison
compare <- treedata(tree,, sort=TRUE) #R will inform you of any inconsistencies!

#Now let's save updated versions of tree and data
tree <- compare$phy <-$data)

Visualizing discrete traits in a phylogenetic framework

Let’s illustrate this by looking at the phylogenetic distribution of diel activity patterns in the above example. We will first extract the information on diel activity pattern from our dataset and then add that information as colored circles next to the species names.

#retrieve info on diel activity pattern and adding names
z <- as.factor($groups); names(z)<-tree$tip.label

#create a vector to be filled with colors for plotting purposes, matching the diel activity pattern

#make very narrow margins

#and now plot the phylogeny and the info on diel activity pattern
plot(tree, show.tip.label=FALSE, direction="upwards")
tiplabels(pch=21, col="black", bg=mycol, cex=1)

Color coding is usually a good way to illustrate different discrete traits (especially when your tree is quite large). However, sometimes it may be useful to visualize discrete traits by different symbols. How would you go about this? Please illustrate your solution by means of a small simulated phylogeny (10 tips) and an arbitrary discrete trait.

Visualizing continuous traits in a phylogenetic framework

There are many, many different ways to illustrate continuous traits in a phylogenetic framework, which is obviously super awesome. Beyond working through this tutorial I encourage you to explore on your own.

We will begin by building on the above example in which we used the tiplabel() function to add symbols. You also may have noticed that I added the cex argument (which wasn’t actually needed because the default is 1…). However, this is a hint how we can add continuous traits to a phylogeny! How about we scale the size of the symbol to the trait value? Let’s do one example involving the optical ratio.

#define a vector with size information
trait <- as.vector($opt)
trait <- as.numeric(trait)

#you will notice that the range of values for the trait is quite small...
## [1] 0.06840709 0.60390952
#hence we scale the trait for better visualization
trait <- trait*3

#make very narrow margins

#...and now plot!
plot(tree, show.tip.label=FALSE, direction="upwards")
tiplabels(pch=21, col="black", bg=mycol, cex=trait)

Repeat the scaling of symbol size with all other continuos traits. You will need to adjust the symbol size until the plot looks aesthetically pleasing.

Let’s do one more example in this session. Instead of scaling the symbol relative to the trait values, let’s add a barplot next to the phylogeny. Here’s one way how to get this done, inspired by Liam Revell’s blog post. Other very cool approaches are found in Sam Price’s tutorial from the 2015 Bodega phylogenetics workshop.

How about we try plotting the optical ratio already used above with Liam’s method? (Sam’s example will serve as a great homeowork).

#We first define a split plotting panel: one area for the tree, the other for the barplot

#make very narrow margins

#Then we'll plot our tree
plot(tree, show.tip.label=FALSE)
tiplabels(pch=21, col="black", bg=mycol, cex=1)

#...and then the barplot with the trait ordered by tip.label (just to make sure...)
names(trait) <- tree$tip.label

barplot(trait[tree$tip.label], horiz=TRUE, space=0, ylim=c(1,length(tree$tip.label))-0.5, names="", col=mycol)

Repeat the above for all other continuos traits.