Fit models to explore trait evolution

An increasingly important aspect of comparative biology is to fit theoretical models to empirical data to see if there are specific patterns in your data that suggest how evolution proceeds. For example, does the trait evolve in a Brownian fashion, or does it appear more similar to an OU process? Something else? In other words, what’s the mode of trait evolution?

Let’s take a look at this with a tutorial that I wrote for the NESCent Academy back in 2014.

Calling needed libraries first.

#Load libraries.
library(ape)
library(geiger)

And now on to download an example tree and example data.

#Phylogeny
tre <- read.tree(url("http://schmitzlab.info/Haemulidae4models.tre"))

#getting data
#contains data on standard length, eye width, and buccal width
dat<-read.csv(url("http://schmitzlab.info/Haemulidae4models.csv"), stringsAsFactors=FALSE, header=TRUE) 
rownames(dat) <- dat[,1]

#tree and data are matched but we must make sure the order is correct
dat <- dat[match(tre$tip.label,rownames(dat)),]

#select a single trait for analysis
trait <- dat[,2] #selecting trait in the second column 
names(trait) <- tre$tip.label #assigning the appropriate tip labels to the trait object

Fitting a Brownian motion model

Let’s start by fitting a Brownian motion model to the data. We can do this with the fitContinuous() function in geiger. We have to specify the phylogeny (tre), the trait(trait), and the type of model (BM = Brownian Motion).

#Fitting the BM model to the data and tree

BM.model <- fitContinuous(tre, trait, model="BM")

#Geiger provides us with lots of information on the fitted model.
#The ouput includes details on estimated model parameters, including sigma squared and root state (z0) for BM. 
#Model summaries (likelihood etc.) and convergence diagnostics are listed as well.

BM.model
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
##  sigsq = 0.000841
##  z0 = 2.239316
## 
##  model summary:
##  log-likelihood = 35.701613
##  AIC = -67.403226
##  AICc = -67.147907
##  free parameters = 2
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 1.00
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
#The output also tells you how to retrieve additional information, for examples the bounds for the likelihood search.
#You can extract the bounds with the following line:

BM.model$bnd
##                  mn           mx
## sigsq 7.124576e-218 2.688117e+43
#Primary results of the model fit  are stored in the "opt" list, which can be accessed like this:

BM.model$opt # The ouput includes info on sigma squared and the root state (z0)
## $sigsq
## [1] 0.000840795
## 
## $z0
## [1] 2.239316
## 
## $lnL
## [1] 35.70161
## 
## $method
## [1] "Brent"
## 
## $k
## [1] 2
## 
## $aic
## [1] -67.40323
## 
## $aicc
## [1] -67.14791
#If you need to have the maximum likelihood estimate of the data given the model, you can type this command: 

BM.model$opt$lnL
## [1] 35.70161
#Sample-size corrected AIC score:
BM.model$opt$aicc # You get the drift even though it's Brownian :)
## [1] -67.14791

Comparing nested models with likelihood ratio tests

There are many more models other than BM, of course. Do they fit better, though?

#Let's fit an OU model.

OU.model <- fitContinuous(tre, trait, model="OU")

#How does the likelihood of the data given an OU model compare to the BM model?

OU.model$opt$lnL
## [1] 47.1445

It looks like the data have higher likelihood within an OU model… One way to assess the difference between these two models more formally is the likelihood ratio test. LRT test the null hypothesis that two models confer the same likelihood to the data.

BM is a special case of OU (for alpha=0), so the models are nested and the LR test is fine.

Note that for thorough tests you should account for phylogenetic uncertainty and perform parameteric bootstrapping.

#likelihood ratio test
delta.BM.OU <- 1-pchisq(2*(OU.model$opt$lnL - BM.model$opt$lnL),1)

delta.BM.OU #p-value
## [1] 1.719196e-06
#Which model do you prefer?

Comparing models with AICc scores and Akaike weights

Another possibility to select the best model is the Akaike Information Criterion. Brian O’Meara has very nice summary on his webpage.

#We don't have to calculate AIC and samplesize-corrected AIC scores (AICc) by hand, because 'geiger' does this for us:
BM.model$opt$aicc #Higher AICc score is worse!
## [1] -67.14791
OU.model$opt$aicc #Lower AICc score is better!
## [1] -87.76726
#AIC and AICc scores are usually presented as differences to the best model (lowest score).
all.aicc <- c(BM.model$opt$aicc, OU.model$opt$aicc)
delta.aicc <- all.aicc - min(all.aicc) 
#This may seem a bit circumstantial for just two models, but will pay off for additional models.

delta.aicc # delta AIC or AICC scores >2 are ususally considered to provide positive support
## [1] 20.61935  0.00000

AIC and AICc scores can also be expressed as Akaike weights, representing relative likelihood of the model (=exp( -0.5 * deltaAIC score for that model).

The Akaike weight for a given model is the rel. likelihood of the model divided by sum of all relative likelihhods across all models:

#Akaike weights
rel.L <- exp(-delta.aicc*0.5)
AK.weights <- rel.L/sum(rel.L)
AK.weights
## [1] 0.0000333081 0.9999666919
#Is this result similar to the result of the likelihood ratio test?

Exercise

Now it’s time to explore this topic in more depth.

Part 1: Does an Early Burst (EB) model fit better than BM or OU?

fitContinuous can fit more models than just BM and OU.

?fitContinuous

When fitting EB, you may run into a little bit of a problem with the bounds of the model parameters.

Remember that you can see the bounds used for the model fit by your.model$bnd You can overwrite the default bounds by adding the following argument to fitContinuous() Let’s assume the parameter that must be changed is called ‘a’, and the new bounds are X and Y: bounds=list(a=c(X, Y))

#Does the EB model fit better than BM or OU?
EB.model <- fitContinuous(tre, trait, model="EB", bounds=list(a=c(-3, 3)))
EB.model
## GEIGER-fitted comparative model of continuous data
##  fitted 'EB' model parameters:
##  a = 0.863023
##  sigsq = 0.000000
##  z0 = 2.242112
## 
##  model summary:
##  log-likelihood = 47.144500
##  AIC = -88.288999
##  AICc = -87.767260
##  free parameters = 3
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.02
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

Part 1: What’s up with other traits?

Pick another trait from the dataset and repeat all above steps.

Concluding note: for thorough analyses one should should perform parametric bootstrapping, and repeat the analysis over a posterior distribution of trees, if possible.