***UPDATE***
I was recently introduced to a great tool for working with phylogenetic trees that can do something similar to what I describe below (and a whole lot more). Check it out at http://itol.embl.de/.
******
It seemed like a simple enough task at the time. I have several sets of classified 16S 454 reads from which I’ve tallied the number of members for some genera of interest. Id like a heatmap showing the abundance of these genera, with a phylogram showing the relationship between genera (rows). Since the heatmap and heatmap.2 (package gplots) commands in R both support row ordering by dendrograms I thought this would be easy.
The first step was to find a single representative sequence for each of 134 genera that I want in the heatmap. I used sequences from the RDP for this. I aligned and trimmed these sequences using standard methods in mothur (my favorite 16S analysis program) That’s the easy bit.
Attempt 1 – A dendrogram should never be confused with a phylogenetic tree. Despite knowing this I thought I could approximate a phylogenetic tree by building a distance matrix outside of R (in clustalo) and performing the clustering inside R (using hclust). The result was a nice looking dendrogram that had clear flaws in illustrating evolutionary relationships. This basic method may actually work, but for reasons that will become clear it didn’t in this case.
Attempt 2 – I decided to build a tree in fasttree (superb tree building program) and then, somehow, get the tree into R and trick R into thinking that it was a dendrogram. Using package ape this is pretty easy. Ape even has a method for converting the tree to a dendrogram. The problem is that the tree must be rooted, bifurcating, and ultrametric (each tip equidistant from the root). This took a substantial bit of tweaking but I was able to get it:
##### create tree as dendrogram for row order #####
library(ape)
rep_tree <- read.tree(‘../known_degrader_rep.tre’)
#For the root I selected the node between the two Archaea and Planctomyces
rep_tree_r <- root(rep_tree,
resolve.root = T,
interactive = T
)
#can’t have any branch lengths of zero or downstream commands will collapse those nodes…
rep_tree_r$edge.length[which(rep_tree_r$edge.length == 0)] <- 0.00001
rep_tree_um <- chronopl(rep_tree_r,
lambda = 0.1,
tol = 0)
rep_tree_d <- as.dendrogram(as.hclust.phylo(rep_tree_um))
plot(rep_tree_d)
Great! When I created the heatmap however, I found that there is a major hitch in this method – and any other method that performs clustering separate from heatmap creation. The heatmap and heatmap.2 commands don’t match up the rows and dendrogram tips by name (in my case by genera), but by the index of the the data as it was first imported into R. Since I imported my tree and the abundance table separately each genus was assigned a different index. Using the root() command prior to converting my tree to a dendrogram made it difficult to reconcile the two, because the index values of the dendrogram were no longer in order. Reordering my abundance matrix so that it matched the index values, but not their current order, took a bit of thought. Here’s the code I came up with, I’m sure there is a cleaner way…
#force row order so that it matches the order of leafs in rep_tree_d
clade_order <- order.dendrogram(rep_tree_d)
clade_name <- labels(rep_tree_d)
clade_position <- data.frame(clade_name,
clade_order
)
clade_position <- clade_position[order(clade_position$clade_order),]
new_order <- match(clade_position$clade_name,
row.names(combined_matrix)
)
combined_ordered_matrix <- combined_matrix[new_order,]
And that seems to work nicely – combined_ordered_matrix is in the same order as the dendrogram INDEX (not the dendrogram leaves). When the heatmap is constructed the accompanying phylogeny looks fine. Note that in the following code I’ve switched all zeros to NAs to differentiate between an abundance of 0 and a low abundance:
##### Generate heatmap #####
library(RColorBrewer)
library(colorRamps)
color <- colorRampPalette(c(‘white’,’blue’,’red’))(100)
row_col_table <- read.table(‘../genera_row_colors.txt’,
as.is = T)
row_col <- row_col_table$V1
combined_ordered_matrix[which(combined_ordered_matrix == 0)] <- NA
library(gplots)
heatmap.2(combined_ordered_matrix,
margins=c(7,10),
sepcolor=”white”,
sepwidth=c(0.01,0.01),
Rowv = rep_tree_d,
Colv = F,
dendrogram=’row’,
colsep=seq(1,10,1),
rowsep=seq(1,134,1),
key = TRUE,
trace=’none’,
col=color,
lwid = c(2,5,5,10),
lhei = c(1,50),
cexCol = 0.7,
cexRow = 0.7,
RowSideColors = row_col,
density.info=c(‘none’),
scale = ‘none’,
na.color = ‘yellow’
)
If you run this code you’ll notice that the margins don’t allow the key to show. Still have to fix that. Here’s the resulting heatmap and phylogeny with 0 values as yellow and the remaining relative abundances on a white-blue-red scale.