PGLS is useful for testing evolutionary associations of traits

Probably one of the most frequently used phylogenetic comparative methods is phylogenetic generalized least squares (PGLS). PGLS allows for the analysis of evolutionary associations between traits and can also handle mutiple predictor variables. The method was first introduced by Grafen (1989), as a generalization of Felsenstein’s independent contrasts.

In this tutorial we will first explore some important underlying concepts such as the phylogenetic variance-covariance matrix and phylogenetic signal (measured by Pagal’s lambda), and then perform a basic PGLS computation.

More details and references can be found in Symonds & Blomberg (2014). That book chapter also comes with a very useful tutorial.

What is the phylogenetic variance-covariance matrix?

The phylogenetic VCV takes a central role for PGLS analyses as it determines the expected variance-covariance matrix based on the phylogenetic relatedness of the species in our datasets. Assuming Brownian motion, the expected covariance is directly proportional to the amount of shared evolutionary history.

The diagonal elemtns of the matrix (variance) represent the total branch length from root to the tip), while the off-diagonal elements represent only the branch length shared by respective species pairs.

Let’s take a look at this. Calling needed libraries first (including some for later!).

#Load libraries.
library(ape)
library(geiger)
library(phytools)
library(nlme)

And now on to a little simulation.

#Simulate a small tree
phy <- pbtree(n=5, tip.label=c("A", "B", "C", "D", "E")) 
brlength <- signif(phy$edge.length, 3) #branch length, rounded

#plotting tree
par(mar=c(0,1,0,1))
plot(phy, label.offset=0.05)
edgelabels(brlength, adj=c(0.5, -0.25)) #adding branch length abels

#displaying the corresponding phylogenetic VCV matrix
vcv.phylo(phy)
##          A        B        C        D        E
## A 2.017705 0.193924 0.193924 0.000000 0.000000
## B 0.193924 2.017705 1.122905 0.000000 0.000000
## C 0.193924 1.122905 2.017705 0.000000 0.000000
## D 0.000000 0.000000 0.000000 2.017705 1.375760
## E 0.000000 0.000000 0.000000 1.375760 2.017705
#total tree height
max(nodeHeights(phy)) #this should match the diagonals of the phylo VCV
## [1] 2.017705

What is phylogenetic signal?

Phylogenetic signal describes in how far trait values are related to phylogenetic covariance. A common way to estimate phylogenetic signal is Pagel’s lambda, which can be used to adjust the internal branch lengths of the tree by keeping all tips contemporaneous. Subsequently one can determin what tree transformation explains the traits the best, assuming Brownian motion. Lambda=1 is the original tree, lambda=0 is a star phylogeny.

Let’s simulate a tree, performa a few lambda-transformations and examine the phyolgenetic VCV’s.

phy <- pbtree(n=5, tip.label=c("A", "B", "C", "D", "E"))
brlength <- signif(phy$edge.length, 3) 

#lambda-transform the tree
phy0 <- rescale(phy, "lambda", 0)
brlength0 <- signif(phy0$edge.length, 3) 

phy0.5 <- rescale(phy, "lambda", 0.5)
brlength0.5 <- signif(phy0.5$edge.length, 3) 

#plot all three different versions of the tre
par(mfrow=c(1,3))
plot(phy, cex=1.5, label.offset=0.1, main="lambda=1")
edgelabels(brlength, adj=c(0.5, -0.25), frame="none")

plot(phy0.5, cex=1.5, label.offset=0.1, main="lambda=0.5")
edgelabels(brlength0.5, adj=c(0.5, -0.25), frame="none")

plot(phy0, cex=1.5, label.offset=0.1, main="lambda=0")
edgelabels(brlength0, adj=c(0.5, -0.25), frame="none")

#examine the associated VCV matrices
vcv.phylo(phy)
##           A         B          C          D          E
## A 1.1969129 0.7413064 0.00000000 0.00000000 0.00000000
## B 0.7413064 1.1969129 0.00000000 0.00000000 0.00000000
## C 0.0000000 0.0000000 1.19691287 0.06262467 0.06262467
## D 0.0000000 0.0000000 0.06262467 1.19691287 1.16444645
## E 0.0000000 0.0000000 0.06262467 1.16444645 1.19691287
vcv.phylo(phy0.5)
##           A         B          C          D          E
## A 1.1969129 0.3706532 0.00000000 0.00000000 0.00000000
## B 0.3706532 1.1969129 0.00000000 0.00000000 0.00000000
## C 0.0000000 0.0000000 1.19691287 0.03131234 0.03131234
## D 0.0000000 0.0000000 0.03131234 1.19691287 0.58222323
## E 0.0000000 0.0000000 0.03131234 0.58222323 1.19691287
vcv.phylo(phy0)
##          A        B        C        D        E
## A 1.196913 0.000000 0.000000 0.000000 0.000000
## B 0.000000 1.196913 0.000000 0.000000 0.000000
## C 0.000000 0.000000 1.196913 0.000000 0.000000
## D 0.000000 0.000000 0.000000 1.196913 0.000000
## E 0.000000 0.000000 0.000000 0.000000 1.196913

I hope that this little simulation both illustrated Pagel’s lambda as well as the phylogenetic variance-covariance matrix.

How is PGLS calculated?

We will not go into statistical details in this tutorial (but in our paper discussion) and place emphasis on how to actually carry out a basic PGLS. There are many differet ways to do this in R (and other software, nut we’ll use the gls() function in the nlme package.

Similar to our tutorial on independent contrasts we will simulate a pair of traits.

#simulate a tree
phy <- pbtree(n=100)

#Liam's fastBM() creates continuous trait under the BM model (plus BM with trend, and bounded BM)
x <- fastBM(phy, sig2=0.2) # sig2 is the the instantaneous rate variance of the BM process
y0 <- fastBM(phy, sig2=0.2) # y is here simulated completely independent of x
y <- 0.7*x + y0 # y is forced to be strongly correlated with x

sim.data <- as.data.frame(cbind(x=x, y=y))

#now we can calculate the PGLS model with Pagel's lambda fixed at 1
fit1 <- gls(y ~ x, data=sim.data, correlation=corPagel(1, phy, fixed=TRUE))

#we can also calculate the PGLS model with Pagel's lambda fixed at 0 (=OLS)
fit0 <- gls(y ~ x, data=sim.data, correlation=corPagel(0, phy, fixed=TRUE))


#The models are not equally good! (as expected, of course!)
anova(fit1, fit0)
##      Model df      AIC      BIC     logLik
## fit1     1  3 156.9105 164.6654  -75.45525
## fit0     2  3 293.3758 301.1307 -143.68788
#or jointly estimate the optimal lambda for the residual values of x and y with maximum likelihood
#fo this specific example lambda should be estimated righjt around 1
fitML <- gls(y ~ x, data=sim.data, correlation=corPagel(0.8, phy, fixed=FALSE), method="ML")
summary(fitML)
## Generalized least squares fit by maximum likelihood
##   Model: y ~ x 
##   Data: sim.data 
##        AIC      BIC    logLik
##   156.5642 166.9849 -74.28212
## 
## Correlation Structure: corPagel
##  Formula: ~1 
##  Parameter estimate(s):
##    lambda 
## 0.9980386 
## 
## Coefficients:
##                  Value Std.Error   t-value p-value
## (Intercept) -0.2935848 0.4797309 -0.611978   0.542
## x            0.7509272 0.1062761  7.065813   0.000
## 
##  Correlation: 
##   (Intr)
## x 0.007 
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.8642727 -0.7142393 -0.0239590  0.8223155  2.2292207 
## 
## Residual standard error: 1.093183 
## Degrees of freedom: 100 total; 98 residual
#This should be pretty similar overall to PGLS with lambda forced to 1
summary(fit1)
## Generalized least squares fit by REML
##   Model: y ~ x 
##   Data: sim.data 
##        AIC      BIC    logLik
##   156.9105 164.6654 -75.45525
## 
## Correlation Structure: corPagel
##  Formula: ~1 
##  Parameter estimate(s):
## lambda 
##      1 
## 
## Coefficients:
##                  Value Std.Error   t-value p-value
## (Intercept) -0.2935767 0.4865939 -0.603330  0.5477
## x            0.7476036 0.1063024  7.032804  0.0000
## 
##  Correlation: 
##   (Intr)
## x 0.007 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -1.82294894 -0.69763958 -0.02425915  0.80520368  2.17854812 
## 
## Residual standard error: 1.119105 
## Degrees of freedom: 100 total; 98 residual

*** What are the slopes and intercepts for the PGLS with lambda=1 and lambda optimized with ML? What slope value do you actually expect?***

OK, before we move on and take these estimates for granted we must evaluate if there are any deviations from the normal rgression assumptions. Two diagnostic plots will reveal if there are any problems.

Let’s do this for fit1.

#check for heteroscedasticity
plot(fit1, resid(., type="n")~fitted(.), col="blue", main="Normalized Residuals vs. Fitted Values",
abline=c(0,0))

#check for departures from normal distribution of residuals
res <- resid(fit1, type="n")
qqnorm(res, col="blue")
qqline(res, col="blue")

Another point for consideration is the danger of getting trapped on a local likelihood peak when trying to optimize lambda with ML. It’s best to calculate likelihood for a whole range of different lambda values between 0 and 1 to figure out how the lkelihood landscape may look like. Let’s give this a try.

#define the range for lambda and at which intervals we are sampling
lambda <- seq(0, 1, length.out = 100)

#Calculate likelihood over the entire range[modified from Symonds&Blombergs tutorial!]
lik <- sapply(lambda, function(lambda) logLik(gls(y ~ x, data=sim.data, correlation = corPagel(value=lambda, phy=phy, fixed = TRUE))))
plot(lik ~ lambda, type = "l", main = expression(paste("Likelihood Plot for ",lambda)), ylab = "Log Likelihood", xlab = expression(lambda))

#adding our ML estimate as a red line
abline(v=fitML$modelStruct, col = "red")

As penultimate step, let’s plot the simulated data and the PGLS line and also add the OLS regression line (which is false).

plot(x, y, pch=19, col="blue", xlab="X", ylab="Y", main="PGLS fit (blue) vs. OLS (red)")
abline(fit1, col="blue")
abline(fit0, col="red", lty=2)

Let’s conclude with an exercise in pairs:

Simulate a tree and subsequently lambda-transform branch length to a value of your choice. Then, simulate two traits with the rescaled phylogeny, similar to the example above. Send the original tree (NOT the lambda-transformed, rescaled tree) and the simulated data to your lab partner and let them figure out what lambda-transformation and slope value you chose to simulate the data.