Estimating net diversification (b-d) and extinction fraction (d/b)

We learned earlier (see solutions to practice problems for lineage diversification, part 1) that there is a substantial amount of inherent variability to birth-death processes. The number of species produced by given birth- and death rate varies considerably! Is it still possible to obtain some reasonable estimates from a time-calibrated phylogeny? We can use the laser package for this.

We will be simulating trees to explore the ability to estimate net diversification and extinction fraction, and for comparative purposes it would be convenient if all trees had evolved over the same time time span, especially when we are comparing multiple trees (see question below). In addition, we want the trees to be reasonable large (really doesn’t make much sense to estimate rates for 5 species…) so we will use a while-loop to produce trees with minimum tip-number.

#loading libraries
require(laser) #you may need to install this package!

#simulate tree with at least 50 tips
x <- 1 #starting parameter for the while-loop
while(x<50) {phy <- pbtree(b=0.6, d=0.1, t=5); x <- Ntip(phy)}

#extract branching times (required for calculation of net divesification rate)
btimes <- branching.times(phy)

#estimate net diversification and extinction fraction
estimates <- bd(btimes, ai=seq(0.01, 0.99, length.out=20))

#net diversification rate (b-d)
## [1] 0.6009513
#extinction fraction (d/b)
## [1] 0.09254117

How do these numbers compare to our chosen settings? Happy with that? Or is it causing you some headaches? Obviously this is only a single test of the reliability of parameter estimates. Please write a script that explores and visualizes this in more detail. How about we do 100 such simulations? While there are several approaches to solve this problem, let’s take the for-loop route. Before you start writing code, spell out all single steps.

Let’s also explore how incomplete sampling may influence estimates of diversification parameters. Similar to above, let’s first simulate a tree distribution and try estimating net diversification and extinction fraction. This time, though, please try these estimates first with the complete set of taxa, and then again after dropping a fixed fraction of taxa randomly. What happens to your ability to infer parameters?

Assessing extinction or slowing of speciation

The gamma parameter (Pybus&Harvey) is a useful statistic to assess the presence of exinction or a slow down of speciation in a clade. For a homogeneous speciation process in the absence of extinction, gamma should be zero. If extinction is present, the internal nodes are closer to recent time than expected, resulting in gamma>0. A negative lambda would indicate that many nodes are closer to the root, pointing towards a slow down of speciation.

Let’s begin with a simple example, again guaranteeing a minimum number of species for a fixed time interval.

#simulate tree with at least 50 tips
x <- 1
while(x<50){phy <- pbtree(b=0.7, d=0, t=12); x <- Ntip(phy)}

#ltt() function provides gamma-estimate
div <- ltt(phy)

div$gamma #gamma estimate
## [1] 0.3226966
div$p #2-tailed p-value
## [1] 0.746925

Given that extinction is set to zero, one would expect a gamma of zero, as well. Is that the case? Well, we are dealing with a stochastic process so we should probably repeat this a few times.

Write a script that calculates the gamma statistic for 100 simulated trees (b: fixed at a value of choice, d: fixed at zero), and summarize the gamma statistic as a histogram (or boxplot). PLease also plot the p-values of these gamma estimates.

After that, let’s look at simulated trees that have some extinction. How does that affect estimates of gamma? Let’s simulate a tree with extinction, drop all extinct tips and compare parameter estimates. Similar to above, we want to keep a fixed timestop butn have a minimum number of tips in the simulated tree.

#simulate tree with at least 100 tips, drop extinct lineages
x <- 1
while(x<100){phy.all <- pbtree(b=0.8, d=0.5, t=12); x <- Ntip(phy.all)}
phy.extant <- drop.tip(phy.all, getExtinct(phy.all))

#define plotting area
par(mfrow=c(2,2), mar=c(4,5,1,5))

#plot for illustrative purposes
plot(phy.all, show.tip.label=FALSE)
plot(phy.extant, show.tip.label=FALSE)
div.all <- ltt(phy.all, plot=TRUE, col="blue")
div.extant <- ltt(phy.extant, plot=TRUE, col="blue")

#extract gamma estimates for tree with all lineages
## [1] -1.494359
## [1] 0.1350818
#extract gamma estimates for tree with only extant lineages
## [1] 2.434159
## [1] 0.01492644

What do these results tell you? Shouldn’t we better loop this over a set of trees? Yes, we should. You don’t need to plot the trees or LTTs, we are just interested in the gamma estimates.

Alright, after all these simulations (and a better understanding of the potential pitfalls we may be encountering), I think it’s time to deal with some real data. I will provide you with a variety of different phylogenies. Pick one, generate a LTT plot, calculate net diversification rate, extinction fraction, and gamma statistic. Let’s do this as a homework assignment.

Solutions to problems can be found here!