Visualizing comparative data in a phylogenetic framework, continued!

We learned in the last tutorial how to combine phylogenetic with discrte and continuous data, which is very useful for exploring evolutionary patterns and from or refine evolutionary hypotheses. This tutorial continues in this vein.

The first example is taken from Price et al. (2012, Evolution). The data associated with this paper are on DRYAD but I slightly reformatted and reduced the data file to make it easier to deal with for our plotting purposes. The tree-file remains unchanged. We will download the data files directly from my site. Please also refer to the tutorial from the Bodega workshop which heavily inspired the example below!

There are several things I want to point out before we start with the tutorial. While the overal plotting techniques is similar, we will encounter a function new to many of us (lapply()), which is a convenient alternative to for-loops in many situations. In addition, we will encounter some other hurdles that we will also face when dealing with data and trees for our research projects… It’s best to be prepared!

#load required packages
library(ape)
library(geiger)
library(phytools)
library(paleotree)

After loading libraries, let’s get our data.

#getting tree
haemulidTrees <- read.nexus(url("http://schmitzlab.info/Trees4dryad.nex"))
haemulidTrees
## 500 phylogenetic trees
#getting data
haemulids <- read.csv(url("http://schmitzlab.info/haemulids.csv"), header=TRUE)
head(haemulids)
##                     Taxon Standard_length Raker_Length Suction_Index
## 1     Anisotremus_caesius          158.00         2.35          0.43
## 2  Anisotremus_davidsonii          240.00         5.10          0.41
## 3       Anisotremus_dovii          175.67         3.13          0.50
## 4 Anisotremus_interruptus          172.67         3.47          0.37
## 5   Anisotremus_moricandi          127.00         3.10          0.40
## 6    Anisotremus_pacifici          139.00         2.50          0.82
##    Habitat
## 1     reef
## 2     reef
## 3 non-reef
## 4     reef
## 5     reef
## 6 non-reef

The tree-file contains a set of 500 time-calibrated trees, the associated data include standard length, gill raker length, suction index, and information about habitat (reef vs. non-reef).

As said before, whenever working with comparative data and phylogenies, one must absolutely make sure that the data exactly match the phylogeny, for both analytical and plotting purposes. Let’s take a look at the haemulid tree.

#Let's begin by plotting the first tree in the list
plot(haemulidTrees[[1]], cex=0.4)
add.scale.bar()

#OK, that can and should look much cleaner...

#Let's ladderize all trees in the distribution with lapply() and plot the first tree again
haemulidTreesLadderized <- lapply(haemulidTrees, ladderize)
plot(haemulidTreesLadderized[[1]], cex=0.4) #Plot the first ladderized tree
add.scale.bar()

#Looks better, doesn't it? 

#However, does the order of tips in the plot match the order of tiplabels in the tree object?

#Please compare
haemulidTreesLadderized[[1]]$tip.label
##  [1] "Haemulon_striatum"               "Haemulon_steindachneri_Pacific" 
##  [3] "Haemulon_macrsotomum"            "Anisotremus_caesius"            
##  [5] "Haemulon_carbonarium"            "Anisotremus_davidsonii"         
##  [7] "Anisotremus_dovii"               "Anisotremus_interruptus"        
##  [9] "Haemulon_chrysargyreum"          "Xenichthys_xantii"              
## [11] "Anisotremus_surinamensis"        "Haemulopsis_leuciscus"          
## [13] "Isacia_conceptionis"             "Haemulon_album"                 
## [15] "Haemulon_plumieri"               "Anisotremus_moricandi"          
## [17] "Haemulopsis_axillaris"           "Orthopristis_reddingi"          
## [19] "Haemulon_boschmae"               "Haemulon_spB"                   
## [21] "Xenistius_californiensis"        "Microlepidotus_brevipinnis"     
## [23] "Pomadasys_crocro"                "Conodon_serrifer"               
## [25] "Haemulon_vittata"                "Anisotremus_scapularis"         
## [27] "Haemulon_flavolineatum"          "Haemulopsis_elongatus"          
## [29] "Haemulon_sciurus"                "Anisotremus_pacifici"           
## [31] "Haemulon_maculicauda"            "Microlepidotus_inornatus"       
## [33] "Anisotremus_virginicus"          "Haemulon_steindachneri_Atlantic"
## [35] "Pomadasys_branickii"             "Pomadasys_panamensis"           
## [37] "Orthopristis_chrysoptera"        "Orthopristis_ruber"             
## [39] "Haemulopsis_nitidus"             "Haemulon_melanurum"             
## [41] "Haemulon_sexfasciatum"           "Pomadasys_corvinaeformis"       
## [43] "Haemulon_spA"                    "Conodon_nobilis"                
## [45] "Orthopristis_chalceus"           "Pomadasys_macracanthus"         
## [47] "Haemulon_aurolineatum"           "Haemulon_flaviguttatum"         
## [49] "Haemulon_scudderi"               "Haemulon_parra"

As you will see, by ladderizing the tree, the plotting order and the tiplabel order are no longer aligned. That’s a problem when plotting phylogenetic and phenotypic data side by side!

There is an easy fix for that, though. First we will compare the tree to the data (a necessary step we learned about previously), then save the ladderized tree and finally re-load it.

#Grab the tree from above
base.tree <- haemulidTreesLadderized[[1]]

#assign row names to the dataframe
rownames(haemulids) <- haemulids$Taxon

#Compare tree with data
compare <- treedata(base.tree, haemulids, sort=TRUE) #R will inform you of any inconsistencies and how they were solved!
## Warning in treedata(base.tree, haemulids, sort = TRUE): The following tips were not found in 'phy' and were dropped from 'data':
##  Emellichthyops_atlanticus
##  Haemulon_bonariense
#Extract vetted data and tree
haemulid.data <- as.data.frame(compare$data)
haemulid.data[,2:4] <- lapply(haemulid.data[,2:4], as.character)
haemulid.data[,2:4] <- lapply(haemulid.data[,2:4], as.numeric)

tree <- compare$phy

#Save the new tree and bring it back
write.tree(tree, "haemulid.tre")
tree <- read.tree("haemulid.tre")

After all this, we are finally ready to plot. We will plot the tree along with habitat info and bar plots of continuously valued data. The approach is slightly different from what we learned before, which goes to show how versatile R is.

#Retrieve info on habitat and adding names
z <- as.factor(haemulid.data$Habitat); names(z) <- tree$tip.label

#Create a vector to be filled with colors for plotting purposes, matching the diel activity pattern
mycol<-character(length(z))
mycol[haemulid.data$Habitat=="reef"]<-"grey"
mycol[haemulid.data$Habitat=="non-reef"]<-"black"

#Extract continuos trait data (standard length, raker length, suction index)
sl <- haemulid.data$Standard_length
names(sl) <- tree$tip.label

rl <- haemulid.data$Raker_Length
names(rl) <- tree$tip.label

SI <- haemulid.data$Suction_Index
names(SI) <- tree$tip.label

#make very narrow margins
par(mar=c(2,0,0,1))

#And finally we can plot :)
plot(tree, cex=0.4, y.lim=c(0,55), x.lim=c(0,64))

#Adding circles to represent habitat info plus label
points(rep(54, length(tree$tip.label)), 1:length(tree$tip.label), pch=21, bg=mycol, cex=1) 
text(54, length(tree$tip.label)+1.5, "Habitat", cex=0.8)

#Add some quantitative data and corresponding labels
segments(56, 1:nrow(haemulid.data), 56 + (sl/120), 1:nrow(haemulid.data), lwd=4, col="gray")
segments(56, 0, 56, 1:nrow(haemulid.data), col="black", lwd=1)
text(55.5, length(tree$tip.label)+1.5, "SL", pos=4, cex=0.8)

segments(59, 1:nrow(haemulid.data), 59 + (rl/5), 1:nrow(haemulid.data), lwd=4, col="gray")
segments(59, 0, 59, 1:nrow(haemulid.data), col="black", lwd=1)
text(58.5, length(tree$tip.label)+1.5, "RL", pos=4, cex=0.8)

points(rep(63, nrow(haemulid.data)), 1:nrow(haemulid.data), pch=21, bg="blue", col="white", lwd=0.25, cex=SI*4)
text(62, length(tree$tip.label)+1.5, "SI", pos=4, cex=0.8)

Please go through the script very, very carefully and modify some of the argument values to get a solid handling of how to generate this kind of plot.

We will need these visualization techniques for our research projects.

Phylomorphospace — adding phylogenetic information to scatterplots

So called phylomorphospaces are a very useful visualization technique to explore patterns of morphospace occupation from a phylogenetic perspective. It takes some time to get used to reading such plots so let’s get started.

We will use the dataset on scleral ring and eye socket dimensions to explain how one can generate such a plot and interpret it.

#Getting tree
tree <- read.tree(url("http://schmitzlab.info/visual.tree.tre"))
tree
## 
## Phylogenetic tree with 188 tips and 187 internal nodes.
## 
## Tip labels:
##  tip1, tip2, tip3, tip4, tip5, tip6, ...
## 
## Rooted; includes branch lengths.
#Getting data
visual.data <- read.csv(url("http://schmitzlab.info/visual.data.csv"), header=TRUE)
head(visual.data)
##   taxon    ol   ext  int     geomm       opt    groups
## 1  tip1  4.38  4.55 2.41  3.634970 0.2914396 nocturnal
## 2  tip2  9.66  7.98 6.35  7.881059 0.5230792 nocturnal
## 3  tip3  9.98  8.51 6.29  8.114036 0.4658447 nocturnal
## 4  tip4  8.26  6.78 5.06  6.568307 0.4571843 nocturnal
## 5  tip5  8.88  6.53 5.20  6.705685 0.4663162 nocturnal
## 6  tip6 12.04 11.09 8.30 10.348531 0.5159388 nocturnal
#Let's first specify row names to allow matching of tree and data
rownames(visual.data) <- visual.data$taxon

#Then we will run the comparison
compare <- treedata(tree, visual.data, sort=TRUE) #R will inform you of any inconsistencies!

#Now let's save updated versions of tree and data
tree <- compare$phy
visual.data <- as.data.frame(compare$data)

#We need to make sure the continuous traits are recognized as numbers
visual.data[,2:6] <- lapply(visual.data[,2:6], as.character)
visual.data[,2:6] <- lapply(visual.data[,2:6], as.numeric)

After these preliminaries, let’s select two different traits: the geometric mean of all three different eye traits (as a proxy for overall size), and the optical ratio.

Then we will generate a plot of the optical ratio against the geometric mean, color-code different diel activity patterns, and add phylogenetic information to generate a phylomorphospace.

#First, we'll define a vector with information on diel activity patterns.
dap <- as.vector(visual.data$groups)

#Then we select which data we want to use for the phylomorphospace. 

#These two traits are in columns 5 and 6, respectively.
#We will also log10-transform the geometric mean, and do some rounding
      
morpho <- visual.data[,5:6]
morpho[,1] <- log10(morpho[,1])
morpho <- signif(morpho, 3)

#Now let's define colors for plotting purposes.
#This can be done by creating an empty vector.
#The vector is subsequently filled with the desired colors for each group.
    
mycol<-character(length(dap))
mycol[dap=="diurnal"]<-"yellow"
mycol[dap=="cathemeral"]<-"blue"
mycol[dap=="nocturnal"]<-"red"
mycol[dap=="unknown"]<-"green" #These are the fossils.

#It's best to name the elements of the vecor by the matching tip label.
names(dap)<-tree$tip.label

#The final preparation step is to define the color object for all internal nodes of the phylogeny. 
cols <- c(mycol, rep("black", tree$Nnode))
names(cols) <- 1:(length(tree$tip)+tree$Nnode)

#And now we get to the main function, which uses the tree (tree) and the data (morpho).
phylomorphospace(tree, morpho, label="off", node.size=c(0.5, 2),                  control=list(col.node=cols), xlab="Eye Size Proxy", ylab="Optical Ratio"); box(lwd=2)
## Warning: closing unused connection 6 (http://schmitzlab.info/
## visual.tree.tre)

Alright! Looks pretty cool, even though I doubt my color choice… We will talk about the implications of the patterns in class.

Please generate phylomorphospaces for at least two other bi-variate combinations of the four continuous traits, and interpret the pattern.

Phylogenetic traitgrams

Yet another way top combine phylgoenetic data with traits are so-called phylogenetic traitgrams. They can be quite helpful in deciphering evolutionary patterns. Let’s generate such a phylogentic traitgram for the functional ratio we have been usin so far.

# We begin by selecting a trait that we would like to investigate. Let's stick to the functional ratio we used for the morphospace earlier (OPT).
trait <- as.vector(visual.data$opt)
names(trait) <- tree$tip.label

#There are multiple options to generate phylogenetic traitgrams. We will use the 'phytools' and 'paleotree' packages.

#Here's the 'phytools' version:
phenogram(tree, trait, col="blue", ftype="off", ylab="Trait")