RAxML is one of the most popular programs around for phylogenetic inference via maximum likelihood. Similarly, hmmalign within HMMER 3 is a popular way to align amino acid sequences against HMMs from Pfam or created de novo. Combine the two and you have an excellent method for constructing phylogenetic trees. But gluing the two together isn’t exactly seamless and novice users might be deterred by a couple of unexpected hurdles. Recently, I helped a student develop a workflow which I’m posting here.
First, define some variables just to make the bash commands a bit cleaner. REF refers to the name of the Pfam hmm that we’re aligning against (Bac_rhodopsin.hmm in this case), while QUERY is the sequence file to be aligned (hop and bop gene products, plus a dinoflagellate rhodopsin as outgroup).
REF=Bac_rhodopsin
QUERY=uniprot_hop_bop_reviewed
Now, align and convert the alignment to fasta format (required by RAxML-ng).
hmmalign --amino -o $QUERY.sto $REF.hmm $QUERY.fasta
seqmagick convert $QUERY.sto $QUERY.align.fasta
Test which model is best for these data. Here we get LG+G4+F
.
modeltest-ng -i $QUERY.align.fasta -d aa -p 8
Check your alignment!
raxml-ng --check --msa $QUERY.align.fasta --model LG+G4+F --prefix $QUERY
Oooh… I bet it failed. Exciting! In this case (using sequences from Uniprot) the long sequence descriptions are incompatible with RAxML-ng. Let’s do a little Python to clean that up.
from Bio import SeqIO
with open('uniprot_hop_bop_reviewed.align.clean.fasta', 'w') as clean_fasta:
for record in SeqIO.parse('uniprot_hop_bop_reviewed.align.fasta', 'fasta'):
record.description = ''
SeqIO.write(record, clean_fasta, 'fasta')
Check again…
raxml-ng --check --msa $QUERY.align.clean.fasta --model LG+G4+F --prefix $QUERY
If everything is kosher go ahead and fire up your phylogenetic inference. Here I’ve limited bootstrapping to 100 trees. If you have the time/resources do more.
raxml-ng --all --msa $QUERY.align.clean.fasta --model LG+G4+F --prefix $QUERY --bs-trees 100
Superimpose the bootstrap support values on the best ML tree.
raxml-ng --support --tree $QUERY.raxml.bestTree --bs-trees $QUERY.raxml.bootstraps
And here’s our creation as rendered by Archaeopteryx. Some day I’ll create a tree that is visually appealing, but today is not that day. But you get the point.