We learned earlier (see solutions to practice problems for lineage
diversification, part 1) that there is a substantial amount of
inherent variability to birth-death processes. The number of species
produced by given birth- and death rate varies considerably! Is it
still possible to obtain some reasonable estimates from a
time-calibrated phylogeny? We can use the *laser* package for
this.

We will be simulating trees to explore the ability to estimate net diversification and extinction fraction, and for comparative purposes it would be convenient if all trees had evolved over the same time time span, especially when we are comparing multiple trees (see question below). In addition, we want the trees to be reasonable large (really doesn’t make much sense to estimate rates for 5 species…) so we will use a while-loop to produce trees with minimum tip-number.

```
#loading libraries
require(ape)
require(geiger)
require(phytools)
require(laser) #you may need to install this package!
#simulate tree with at least 50 tips
x <- 1 #starting parameter for the while-loop
while(x<50) {phy <- pbtree(b=0.6, d=0.1, t=5); x <- Ntip(phy)}
#extract branching times (required for calculation of net divesification rate)
btimes <- branching.times(phy)
#estimate net diversification and extinction fraction
estimates <- bd(btimes, ai=seq(0.01, 0.99, length.out=20))
#net diversification rate (b-d)
estimates$r1
```

`## [1] 0.6009513`

```
#extinction fraction (d/b)
estimates$a
```

`## [1] 0.09254117`

*How do these numbers compare to our chosen settings?
Happy with that? Or is it causing you some headaches? Obviously
this is only a single test of the reliability of parameter
estimates. Please write a script that explores and visualizes this
in more detail. How about we do 100 such simulations? While there
are several approaches to solve this problem, let’s take the
for-loop route. Before you start writing code, spell out all
single steps.*

*Let’s also explore how incomplete sampling may influence
estimates of diversification parameters. Similar to above, let’s
first simulate a tree distribution and try estimating net
diversification and extinction fraction. This time, though, please
try these estimates first with the complete set of taxa, and then
again after dropping a fixed fraction of taxa randomly. What
happens to your ability to infer parameters?*

The gamma parameter (Pybus&Harvey) is a useful statistic to assess the presence of exinction or a slow down of speciation in a clade. For a homogeneous speciation process in the absence of extinction, gamma should be zero. If extinction is present, the internal nodes are closer to recent time than expected, resulting in gamma>0. A negative lambda would indicate that many nodes are closer to the root, pointing towards a slow down of speciation.

Let’s begin with a simple example, again guaranteeing a minimum number of species for a fixed time interval.

```
#simulate tree with at least 50 tips
x <- 1
while(x<50){phy <- pbtree(b=0.7, d=0, t=12); x <- Ntip(phy)}
#ltt() function provides gamma-estimate
div <- ltt(phy)
```

`div$gamma #gamma estimate`

`## [1] 0.3226966`

`div$p #2-tailed p-value`

`## [1] 0.746925`

Given that extinction is set to zero, one would expect a gamma of zero, as well. Is that the case? Well, we are dealing with a stochastic process so we should probably repeat this a few times.

*Write a script that calculates the gamma statistic for
100 simulated trees (b: fixed at a value of choice, d: fixed at
zero), and summarize the gamma statistic as a histogram (or
boxplot). PLease also plot the p-values of these gamma estimates.*

After that, let’s look at simulated trees that have some extinction. How does that affect estimates of gamma? Let’s simulate a tree with extinction, drop all extinct tips and compare parameter estimates. Similar to above, we want to keep a fixed timestop butn have a minimum number of tips in the simulated tree.

```
#simulate tree with at least 100 tips, drop extinct lineages
x <- 1
while(x<100){phy.all <- pbtree(b=0.8, d=0.5, t=12); x <- Ntip(phy.all)}
phy.extant <- drop.tip(phy.all, getExtinct(phy.all))
#define plotting area
par(mfrow=c(2,2), mar=c(4,5,1,5))
#plot for illustrative purposes
plot(phy.all, show.tip.label=FALSE)
plot(phy.extant, show.tip.label=FALSE)
div.all <- ltt(phy.all, plot=TRUE, col="blue")
div.extant <- ltt(phy.extant, plot=TRUE, col="blue")
```

```
#extract gamma estimates for tree with all lineages
div.all$gamma
```

`## [1] -1.494359`

`div.all$p`

`## [1] 0.1350818`

```
#extract gamma estimates for tree with only extant lineages
div.extant$gamma
```

`## [1] 2.434159`

`div.extant$p`

`## [1] 0.01492644`

*What do these results tell you? Shouldn’t we better loop
this over a set of trees? Yes, we should. You don’t need to plot
the trees or LTTs, we are just interested in the gamma estimates.*

Alright, after all these simulations (and a better understanding of the potential pitfalls we may be encountering), I think it’s time to deal with some real data. I will provide you with a variety of different phylogenies. Pick one, generate a LTT plot, calculate net diversification rate, extinction fraction, and gamma statistic. Let’s do this as a homework assignment.