Metagenomic Assembly

In an earlier post I mentioned some odd microbiological observations that our group made during field work in Barrow, AK in 2010.  I also talked about how I’m hoping to repeat that observation this year, using the microscopy technique FISH.  In addition to collecting new data however, there is plenty of work left to be done on the 2010 samples.  I’m spending a lot of time right now working with two metagenomes derived from one young sea ice and one frost flower sample from the 2010 sample set.  A metagenome, as the name suggests, is a compilation of many different genomes.  Consider a liter of seawater.  It might contain around one billion bacteria, and therefor one billion bacterial genomes.  Although bacterial genomes can vary quite a bit even within a given bacterial “species”, for the sake of argument lets say that those one billion bacteria comprise 1000 bacterial species, representing 1000 different genomes.

We’d like to know something about what the bacterial assemblage in our liter of water is doing.  Are they photosynthesizing?  Consuming high molecular weight organic compounds?  Living free in the water or attached to particles?  Clues to all these questions, indeed to the entire natural history of each species, can be found within their genomes.  It isn’t at all practical however, to sequence all 1000 of these genomes (most belonging to species that you couldn’t even bring into pure culture in the lab without many years of work).  The solution?  Sequence all the DNA contained within the water, never mind which species it originally belonged to!

The resulting mess of sequence data is the metagenome, and allows for the least biased way of assessing the metabolic potential of a microbial assemblage.  To do this a researcher sifts through all the little bits of DNA, usually in the range of 50-100 bp, and assigns a putative function to the gene of origin based on similarity to a known gene in a database.  That’s great, but it would still be nice to know something about the metabolic potential of individual species in the assemblage.  This requires assembling the metagenome, something that was not possible until just a few years ago.  Given enough computer power, enough sequence data, and a low diversity assemblage (just a few species), researchers have been able to reconstruct entire microbial genomes from metagenomes.  One such research group is right here at the UW School of Oceanography, and recently published their metagenome derived genome (from an uncultured Euryarchaeota) in the journal Science.

With a lot of guidance from them I’ve established a pipeline for assembling my Barrow metagenomes.  I won’t get complete genomes, there isn’t nearly enough sequence data, but I might be able to create large enough contiguous sequence segments (contigs) to link some metabolic functions with specific bacteria in that environment.  You can check out my workflow here.  In brief I use a series of de Bruijn Graph assemblies and read-to-contig alignments to gradually build bigger contigs.  For data reduction I use a combination of standard trimming tools and digitial normalization.  The workflow page includes links to all of the tools I’m currently using.  The figure below shows the results from my first round assembly.  The x-axis is contig length in kmer (to convert to bp add 22).  The y-axis is coverage, or (roughly) the number of times that particularly contig is seen in the assembly.  In the first round I’ve got contigs out to 10,000 bases, long enough to code for several genes.  Not bad!

Ocov vs. contig length (in kmer, bp = kmer+22) following the first round assembly of PE Illumina reads using Velvet. The reads were trimmed and reduced by digital normalization prior to assembly.

 

3163 Total Views 2 Views Today
This entry was posted in Research. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

WordPress Anti Spam by WP-SpamShield