***UPDATE***
I was recently introduced to a great tool for working with phylogenetic trees that can do something similar to what I describe below (and a whole lot more). Check it out at http://itol.embl.de/.
******
It seemed like a simple enough task at the time. I have several sets of classified 16S 454 reads from which I’ve tallied the number of members for some genera of interest. Id like a heatmap showing the abundance of these genera, with a phylogram showing the relationship between genera (rows). Since the heatmap and heatmap.2 (package gplots) commands in R both support row ordering by dendrograms I thought this would be easy.
The first step was to find a single representative sequence for each of 134 genera that I want in the heatmap. I used sequences from the RDP for this. I aligned and trimmed these sequences using standard methods in mothur (my favorite 16S analysis program) That’s the easy bit.
Attempt 1 – A dendrogram should never be confused with a phylogenetic tree. Despite knowing this I thought I could approximate a phylogenetic tree by building a distance matrix outside of R (in clustalo) and performing the clustering inside R (using hclust). The result was a nice looking dendrogram that had clear flaws in illustrating evolutionary relationships. This basic method may actually work, but for reasons that will become clear it didn’t in this case.
Attempt 2 – I decided to build a tree in fasttree (superb tree building program) and then, somehow, get the tree into R and trick R into thinking that it was a dendrogram. Using package ape this is pretty easy. Ape even has a method for converting the tree to a dendrogram. The problem is that the tree must be rooted, bifurcating, and ultrametric (each tip equidistant from the root). This took a substantial bit of tweaking but I was able to get it:
##### create tree as dendrogram for row order #####
library(ape)
rep_tree <- read.tree(‘../known_degrader_rep.tre’)
#For the root I selected the node between the two Archaea and Planctomyces
rep_tree_r <- root(rep_tree,
resolve.root = T,
interactive = T
)
#can’t have any branch lengths of zero or downstream commands will collapse those nodes…
rep_tree_r$edge.length[which(rep_tree_r$edge.length == 0)] <- 0.00001
rep_tree_um <- chronopl(rep_tree_r,
lambda = 0.1,
tol = 0)
rep_tree_d <- as.dendrogram(as.hclust.phylo(rep_tree_um))
plot(rep_tree_d)
Great! When I created the heatmap however, I found that there is a major hitch in this method – and any other method that performs clustering separate from heatmap creation. The heatmap and heatmap.2 commands don’t match up the rows and dendrogram tips by name (in my case by genera), but by the index of the the data as it was first imported into R. Since I imported my tree and the abundance table separately each genus was assigned a different index. Using the root() command prior to converting my tree to a dendrogram made it difficult to reconcile the two, because the index values of the dendrogram were no longer in order. Reordering my abundance matrix so that it matched the index values, but not their current order, took a bit of thought. Here’s the code I came up with, I’m sure there is a cleaner way…
#force row order so that it matches the order of leafs in rep_tree_d
clade_order <- order.dendrogram(rep_tree_d)
clade_name <- labels(rep_tree_d)
clade_position <- data.frame(clade_name,
clade_order
)
clade_position <- clade_position[order(clade_position$clade_order),]
new_order <- match(clade_position$clade_name,
row.names(combined_matrix)
)
combined_ordered_matrix <- combined_matrix[new_order,]
And that seems to work nicely – combined_ordered_matrix is in the same order as the dendrogram INDEX (not the dendrogram leaves). When the heatmap is constructed the accompanying phylogeny looks fine. Note that in the following code I’ve switched all zeros to NAs to differentiate between an abundance of 0 and a low abundance:
##### Generate heatmap #####
library(RColorBrewer)
library(colorRamps)
color <- colorRampPalette(c(‘white’,’blue’,’red’))(100)
row_col_table <- read.table(‘../genera_row_colors.txt’,
as.is = T)
row_col <- row_col_table$V1
combined_ordered_matrix[which(combined_ordered_matrix == 0)] <- NA
library(gplots)
heatmap.2(combined_ordered_matrix,
margins=c(7,10),
sepcolor=”white”,
sepwidth=c(0.01,0.01),
Rowv = rep_tree_d,
Colv = F,
dendrogram=’row’,
colsep=seq(1,10,1),
rowsep=seq(1,134,1),
key = TRUE,
trace=’none’,
col=color,
lwid = c(2,5,5,10),
lhei = c(1,50),
cexCol = 0.7,
cexRow = 0.7,
RowSideColors = row_col,
density.info=c(‘none’),
scale = ‘none’,
na.color = ‘yellow’
)
If you run this code you’ll notice that the margins don’t allow the key to show. Still have to fix that. Here’s the resulting heatmap and phylogeny with 0 values as yellow and the remaining relative abundances on a white-blue-red scale.
Hi Jeff.
I used your method to combine a phylogenetic tree and heatmap and it worked! However, there are some inconsistencies. For example, the order of tree tips was changed in the final tree + heatmap compared with the tree itself.
I found an easier way to draw the tree with a heatmap.
The ggtree and ggtreeExtra packages offer gheatmap() function and you can draw your tree+heatmap with only one line. ggtreeExtra is helpful with circular trees + heatmap + barplots ….
Great, thanks for sharing! This tutorial is pretty outdated now 🙂
use ggtree, , will make your life more easier.
That looks great, thanks for the tip!
Yes, there seems to be something strange in your input file – R won’t even read that tree in. It did read in, with a warning, to Archaeopteryx. Arbitrarily rerooting the tree in Archaeopteryx produced a tree that read fine into R (below). I suggest using the following and re-rooting within R. Good luck!
((((((Mt3.5v5,Mt4.0v1),Car),(((Pvu186,Pvu218),(Gma109,Gma189)),Cca))),(((Ppe139,Mdo196),Fve226),Csa122)),((((((((((Cre169,Vca199),Csu227),((Mpu228,Mpu229),Olu231)),Ppa152),Smo91),(((Sbi79,Zma181),(Sit164,Pvi202)),(Osa193,Bdi192))),Aco195),((Stu206,Sly225),Mgu140)),Vvi145),((((((((Ath167,Aly107),Cru183),(Bra197,Tha173)),Cpa113),(Gra221,Tca233)),(Csi154,(Ccl165,Ccl182))),((Mes147,Rco119),(Lus200,(Ptr156,Ptr210)))),Egr201)));
Hi, Jeff,
I do not know how do you made the rooted phylogenetic tree with R and how do you choose the node of root.(For the root I selected the node between the two Archaea and Planctomyces).
Use the root() command, and either specify a node or use the interactive option to select a node. Typically you want the tree to be rooted at the node from which the outgroup diverges from the rest of the sequences.
Hi John,
I am trying your step rep_tree_r mytree_brlen plot(tree,edge.width=3,lend=2)
Error in plot.phylo(tree, edge.width = 3, lend = 2) :
there are single (non-splitting) nodes in your tree; you may need to use collapse.singles()
So followed instructions suggested at http://blog.phytools.org/2013/06/robust-newick-tree-reader.html?showComment=1433380847966#c1946368120991641492
> tree=collapse.singles(tree)
> plot(tree,edge.width=3,lend=2)
But this tree looks completely mangled….
So I suspect the problem is with my .tre input file (copied and posted below)
Could you please help me identify the problem so that I can move ahead with my analysis?
The goal of this is to convert my species cladogam to a dendrogram that I can use as the column dendrogram for my heatmap
(((((((((((((Mt3.5v5, Mt4.0v1), Car), (((Pvu186, Pvu218), (Gma109, Gma189)), Cca))), (((Ppe139, Mdo196), Fve226), Csa122)), ((((((((Ath167, Aly107), Cru183), (Bra197, Tha173)), Cpa113), (Gra221, Tca233)), (Csi154, (Ccl165, Ccl182))), ((Mes147, Rco119),(Lus200, (Ptr156, Ptr210)))), Egr201)), Vvi145), ((Stu206, Sly225), Mgu140)), Aco195), (((Sbi79, Zma181),(Sit164, Pvi202)), (Osa193, Bdi192))), Smo91), Ppa152), (((Cre169, Vca199), Csu227), ((Mpu228, Mpu229), Olu231)));
Thank you!