More on phylogenetic trees in R

As I mentioned before, there are many, many more options to handle trees in R. Explore these options in Emmanuel Paradis’ book (Analysis of Phylogenetics and Evolution with R) and Liam Revell’s blog.

This tutorial largely focuses on the phytools and strap packages, with emphasis on simulating, plotting, and saving trees.

if(!require(phytools)){
    install.packages("phytools") #'phytools' will be installed if needed
    library(phytools) #loading the library
}
## Loading required package: phytools
## Loading required package: ape
## Loading required package: maps
if(!require(strap)){
    install.packages("strap") #'strap' will be installed if needed
    library(strap) #loading the library
}
## Loading required package: strap
## Loading required package: geoscale

Determining the working directory

Given that we will be saving R objects in this tutorial, we should tell R where it should actually store the files. We can tell R where to save files by setting the working directory. I would recommend to create a dedicated folder for our course, look up the path to that folder (using the Finder [Mac] or Exploer [PC]) and insert it in below’s example.

#setting your working directory PC-VERSION
setwd("C:/Users/lschmitz/Dropbox/BIOL182L_AP/code") #must change to your file path

#setting your working directory MAC-VERSION
setwd("/Users/lschmitz/Dropbox/BIOL182L_AP/code") #must change to your file path

#to verify your working directory:
getwd()

That’s all. Please always set your working directory to this folder from now on!

Simulating, plotting, and storing trees with the phytools package

We have explored tree simulations with the rtree() function in the ape pacakge last time, and we’ll do something similar in phytools. The pbtree() function is a bit more versatile, in that it allows to set both birth rate (b) and death rate (d), ie., you can simulate trees with varying degrees of speciation and extinctions. We’ll now explore what that means.

#simulating trees => please explore different values for b and d!
phy1 <- pbtree(b=1, d=0, n=10) #n specifies the desired number of species
phy2 <- pbtree(b=0.9, d=0.1, n=10)
phy3 <- pbtree(b=0.8, d=0.2, n=10)
phy4 <- pbtree(b=0.9, d=0.2, n=10)

#plotting trees with fancyTree(), highlighting extinct lineages
par(mfrow=c(2,2))
fancyTree(phy1, edge.width=2, extinction=TRUE); fancyTree(phy2, edge.width=2, extinction=TRUE); fancyTree(phy3, edge.width=2, extinction=TRUE); fancyTree(phy4, edge.width=2, extinction=TRUE)

How do these trees differ from each other? Let’s discuss as a group.

After we have discussed the implications of varying b and d let’s move on to some practical things. For example, what if you want to store your simulated trees? This can easily be done.

write.tree(phy3, "simulation.tre") #this will store your tree in your working directory

And if you want to load into R at some point, you can simlpy read it in:

stored.tree <- read.tree("simulation.tre") #this will load your tree from your working directory
fancyTree(stored.tree, edge.width=2, extinction=TRUE)

We will see later on how important it is to have a good handle of saving and loading objects into R…

The pbtree() function has another very convenient argument, which allows us to simulate many trees in one go. For example, let’s generate 100 trees with the same settings. The resulting set of trees is a multiPhylo object.

trees <- pbtree(b=0.9, d=0.1, n=20, nsim=100) #nsim specifies how many simulations are carried out
write.tree(trees, "100trees.tre") #store trees
trees
## 100 phylogenetic trees
class(trees)
## [1] "multiPhylo"

A multiPhylo object is a list of tree objects and it’s very straightforward to select specific trees from this set.

#Plotting a single tree from the multiPhylo object
tree67 <- trees[[67]] #retrieving tree #67
fancyTree(tree67, edge.width=2, extinction=TRUE)

#Plotting several trees from the multiPhylo object using a for-loop
par(mfrow=c(3,2)) #let's create a plot with 6 panels

#let's grab trees 1 through 6
for (i in 1:6) {
fancyTree(trees[[i]], edge.width=2, extinction=TRUE)
}

Let’s practice for-loops by selecting different trees!!! Try at least three different settings.

Plotting trees against the geologic time scale with the strap packge

If you ahve a tree with branch length in millions of years, the strap package offers a very neat plotting option. t plots your tree in alignment with the geologic timescale. Let’s take a look at this.

#let's use a tree from our multiPhylo object
tree <- trees[[1]]

#let's mulitply branches by 40 to cover more time! Try different factors!
tree$edge.length <- 40*tree$edge.length

#we also must specify the root time
tree$root.time <- max(nodeHeights(tree))

#now we can plot the tree against the geologic timescale
geoscalePhylo(tree, cex.ts=0.6, cex.tip=0.6)