DIAMOND – A game changer?

DIAMOND search speed over BLASTX

Figure from Buchfink et al. 2014. The figure shows the speed up of DIAMOND over BLASTX and an alternate new-generation aligner known as RAPSearch2. Metagenomic datasets from the Human Microbiome Project were aligned against the NCBI nr database.

A very exciting paper was published in Nature Methods a couple weeks ago by Benjamin Buchfink and colleagues: Fast and Sensitive Protein Alignment Using DIAMOND.  The paper debuts the DIAMOND software, touted as a much-needed replacement for BLASTX.  BLASTX has been a bioinformatics workhorse for many years and is (was) the best method to match a DNA sequence against a protein database.  BLASTX worked well in the era of Sanger sequencing.  With routine sequence runs today producing many gigabytes of data (100’s of millions of reads) however, BLASTX is woefully inadequate.   I once attempted to annotate a relatively small metagenome using BLASTX against the NCBI nr database using a high performance cluster of some size and couldn’t to it.  It was theoretically possible, but the resources required were out of alignment with what I stood to gain from a complete annotation.  I pulled the plug on the experiment at around 25 % complete to avoid getting blacklisted.

There are faster alignment (i.e. search) algorithms available, for example the excellent BWA DNA-DNA aligner, but they don’t quite do what BLAST does.  BLAST is valuable for the same reason that it is slow; it’s sensitive.  Another way of thinking about this is that BLAST has good methods for aligning dissimilar sequences and thus returning a more dissimilar (but still desired) match.  Many sequences queried against, for example, the NCBI nr database don’t have a close match in that database.  The more dissimilar the query is from a candidate homologue the more calculations need to be performed for the aligner to propose a biologically plausible alignment.  This is bad enough when aligning two DNA sequences.  Because the amino acid alphabet is larger than the DNA alphabet however, comparing protein sequences is even more computationally intensive.

The developers of DIAMOND used improved algorithms and additional heuristics to build a better aligner.  I’m not going to attempt a detailed description of the algorithms – which I would certainly botch – but the paper refers to three modifications made to the BLAST concept that result in a huge speedup for DIAMOND.  First, DIAMOND uses an optimized subset (seed) of the query and reference sequences to find matches.  The subset is described by the seed weight and shape.  Second, the aligner employs something called double indexing, an improved method for storing information regarding seed position within each sequence.  Finally, the aligner relies on a reduced amino acid alphabet consisting of only 11 amino acids.

So how fast is DIAMOND?  Really, really fast.  The paper describes some basic benchmarks.  I tried it out on an 12 core Linux box.  I did not assess accuracy, but for a basic search it was everything it promised to be and easy to use (what is bioinformatics coming to?).  The pre-compiled Unix executable worked straight out of the box and the DIAMOND developers have kindly copied the BLASTX command structure.  To try it out I aligned the metagenome described above against the Uniref-90 database as such:

diamond makedb --in uniref90.fasta -d uniref90 &&
date > start_stop.txt
diamond blastx -d uniref90 44_trim.mates.fasta -o diamond_test.txt
date >> start_stop.txt

The metagenome contains just over 12 million Illumina sequence reads and this (slightly old) version of Uniref-90 contains just over 15 million.  That’s a lot smaller than nr, but still pretty big.  The DIAMOND default is to use the maximum number of cores available – on this hyperthreaded system it recognized 24.  The whole alignment took only 17 minutes and never more than 16 Gb of memory.  This is such a large speedup over BLASTX that I’m having a hard time wrapping my head around it.  There’s no way that I’m aware of to estimate how long it would take to execute a similar BLASTX search but I think it would be weeks.  It’s hard to convey how exciting this is.  DIAMOND may have just eliminated one of the biggest analytical bottlenecks for environmental sequence analysis (to be replaced by a larger one, I’m sure).

Posted in Research | 4 Comments

Basic computers for bioinformatics, revisited

I recently started a postdoc in a lab without any real computer infrastructure so I was in need of a system with a little more juice than my laptop.  While a graduate student at the UW I did everything except for the lightest-weight bioinformatics on either a MacPro workstation (dating from 2008 but running strong) or the Hyak high-performance cluster.  For the next two years I don’t anticipate needing anything like Hyak (though it was always very reassuring to know it was there, if I needed it, which I did far more than I ever anticipated).  My immediate concern was to find a replacement for the MacPro.

In a previous post I suggested that bioinformaticians, as a group, might be a little Mac-happy.  Even if I wanted one however, I couldn’t afford a high end MacPro on my budget.  My two year fellowship came with $5,000 in research funds so I needed to find the most computer that I could for < 5K (lab supplies will have to take care of themselves.  With luck my starter population of pipette tips will breed).  After a bit of shopping I ended up purchasing a custom built workstation from Puget Systems, a small company based outside of Seattle.  For just over $4,300 I got a 12 core machine with 32 Gigs of RAM and 8.5 Tb of storage, including a solid state boot drive.  That won’t be enough RAM or storage if all my pending projects bear fruit, but the system can handle 128 Gb RAM and 32 Tb of storage if I need to expand.  For the operating system I settled on Ubuntu for ease of use.  The box is purring away now but there was enough weirdness in the beginning that I’m not sure I’d go with Ubuntu again.  Redhat might have been a better solution as I need the workstation configured like a server, not a home computer.

Just for kicks I did a quick cost-comparison against a similar MacPro.  Apple’s gone a little crazy with the current generation of MacPros; I believe you can only get a single processor (up to 12 cores) and, consistent with Apple’s design philosophy, the storage has been moved outside the central box.  I’ve been told that the new configuration vastly improves I/O speeds but I’m not tech savvy enough to know how or why.  At any rate the “equivalent” MacPro comes in at $9,198.00 and, while sleek and sexy (minus the weird external storage module), lacks the expandability of the hulking box in my office.

In case it is of interest for other junior researchers on a budget here are the parts that went into my box from Puget Systems.  It took them only 3-4 days to build the system and get it out the door.  If you’ve got the money this basic system could be configured to a pretty decent computer – I believe the motherboard can support 2 x 18 Intel Xeon cores (oh I wish!) and 128 Gb RAM.

Parts for a ~$4,300 workstation for mid-weight bioinformatics.

Parts for a ~$4,300 workstation for mid-weight bioinformatics.

 

Posted in Research | Leave a comment

Future of ice jobs at the UW

There are two great polar ecology jobs posted at the University of Washington right now.  It’s rare to find academic jobs specific to the polar regions so this will be a great opportunity for someone out there!  The new positions are part of the campus-wide Future of Ice Initiative and, I believe, are the result of a previously failed search.  Initially it was envisioned that the School of Oceanography and the School of Aquatic and Fisheries Science would create a shared position for a polar ecologist.  That turned out to be tricky; the two schools have very different research agendas and methods.  After failing to find a candidate that (they felt) could straddle the very large gap between the schools, administration took a second look under the couch cushions and came up with enough spare change to generate two positions.  Here are the job postings:

TWO FACULTY POSITIONS: ONE IN OCEANOGRAPHY AND ONE IN AQUATIC AND FISHERY SCIENCESThe College of the Environment at the University of Washington (http://coenv.uw.edu) invites applications for two new tenure-track assistant professor positions, as part of its continuing commitment to research and education on Earth’s polar regions through the Future of Ice Initiative (http://ice.uw.edu).  This campus-wide initiative focuses on developing partnerships with diverse stakeholders in the polar regions, where the triple challenges of climate change, new economic pressures, and rapid social and political disruption intersect.  Descriptions of the new positions, one in the School of Aquatic and Fishery Sciences and one in the School of Oceanography, are given below.  University of Washington faculty engage in teaching, research and service.The successful candidates are expected to enhance the University of Washington’s multidisciplinary research in polar science, develop an externally funded research program, mentor the next generation of scientists, and contribute to rigorous education serving an increasingly diverse student population at the graduate and undergraduate levels.  The University of Washington promotes diversity and inclusivity among our students, faculty, staff, and public; for each of these faculty positions, we seek applicants who are committed to these principles.


ASSISTANT PROFESSOR, TENURE-TRACK POSITION IN THE SCHOOL OF AQUATIC AND FISHERY SCIENCES (SAFS)

We seek to hire an integrative scientist who will advance our understanding of ecological processes and ongoing changes in high-latitude (polar or subpolar) marine or freshwater ecosystems.  We seek an ecologist whose research focuses on basic and/or applied questions and may include, but is not limited to, high latitude fisheries or broader ecosystem studies across multiple trophic levels from zooplankton to seabirds and marine mammals.  Applicants should describe how their research and teaching will enhance collaborative linkages within the School of Aquatic and Fishery Sciences and among other partners in the Future of Ice Initiative.  Questions pertaining to this search can be addressed to Dr. Gordon Holtgrieve, Search Committee member (gholt@uw.edu) until 19 November and afterwards to Dr. George Hunt, Search Committee Chair, (geohunt2@uw.edu).  More information on SAFS can be found at http://fish.washington.edu/


ASSISTANT PROFESSOR, TENURE-TRACK POSITION IN THE SCHOOL OF OCEANOGRAPHY (SO)

We seek to hire an integrative scientist who will contribute to an understanding of biological processes and ongoing changes in high-latitude (polar or subpolar) marine ecosystems.  We are interested particularly, though not exclusively, in candidates whose research focuses on the physiology, ecology or biogeography of lower trophic levels.  Research approaches may include field observations, remote sensing, laboratory experimentation, genomics and bioinformatics, or modeling.  Applicants should describe how their research and teaching will enhance collaborative linkages between disciplines within the School of Oceanography and among other partners in the Future of Ice Initiative.  Questions pertaining to this search can be addressed to Dr. Jody Deming, Search Committee Chair, (jdeming@uw.edu).  More information on the School of Oceanography can be found at http://ocean.washington.edu.


ADDITIONAL INFORMATION

To apply, send curriculum vitae with publication list, statements of research and teaching interests with reference to diversity/inclusivity, and the names and contact information of four references.  Applications should clearly indicate the position sought; i.e., in SAFS or in SO.  Electronic materials are preferred; send to FoI@uw.edu.  Hard copies can be sent to Future of Ice Initiative – Quaternary Research Center, University of Washington, Box 351310, Seattle, WA 98195-1310. Applications should be received prior to December 15th, 2014, to ensure full consideration.

University of Washington is an affirmative action and equal opportunity employer. All qualified applicants will receive consideration for employment without regard to, among other things, race, religion, color, national origin, sex, age, status as protected veterans, or status as qualified individuals with disabilities. The University of Washington is recognized for supporting the work-life balance of its faculty. A PhD is required at the time of appointment.

 

Posted in Uncategorized | Leave a comment

In defense of observations

This week I’m at the biennial meeting of the International Society of Microbial Ecologists (ISME) in Seoul.  During lunch yesterday there was a special “bird’s eye view” talk given by Dr. James Prosser, a preeminent microbiologist from the University of Aberdeen.  the subject of the talk was the perceived over-indulgence of the community in observational, rather than hypothesis-testing, studies.  The organizers asked him to be provocative and he certainly was.  The talk was stimulating and well thought-out, but begged for a counter-point to balance his strong views on the subject.  In defense of observational microbiology here’s my attempt at a counter-point to the talk.

Dr. Prosser’s thesis was that microbial ecology has lost it’s way a bit, with too many researchers relying on “observational” studies that simply report the state or composition of a microbial community without really testing any hypotheses.  The classic example here is a 16S gene phylotyping or metagenomic study that just explore an environment for the sake of exploration.  The community often refers to this as stamp collecting, and while I agree that it’s not the right way to “do science”, I disagree strongly with Dr. Prosser about why this is and what we should do about it.  A second, but related, complaint was that the community is spending too much time and effort developing new tools and methods.

Point 1:  Why we need new observations

Observation is the first, and most essential, step in the scientific method that we all learned in elementary school.  Before anyone can formulate a question and develop a hypothesis they must observe something that they cannot explain.  I think that Dr. Prosser might be missing two essential points here; it is difficult to observe microbial communities with sufficient detail to develop interesting questions, and, microbially, the word is a really, really big place that requires a lot of observation.  I would not be surprised if Antoni van Leeuwenhoek, the amateur scientist who made the first biological observations with a microscope in the 17th century, got a lot of flack from contemporary philosophers for tinkering with instruments and wasting time using them to look at water drops instead of debating the “real” questions about life.

The “omics” tools that Dr. Prosser is quick to dismiss are today’s microscope; vastly more complex but capable of producing magnitudes more information.  Time and money spent learning these tools and improving their application is time and money well spent.  So is time and money spent applying these tools in what might appear to be idle contemplation of the world.  I’m not saying we shouldn’t be efficient and sensible with scientific studies, but sometimes the most obvious questions aren’t the most interesting.  It can take a lot of observations to observe something worthy of a testable hypothesis.

Point 2:  Other fields are way better than us, and they invest heavily in observation

Biology in general and microbial ecology in particular is often viewed as a “softer” science by those more aligned with the fields of math, physics, and chemistry.  I had a bit of a laugh when Dr. Prosser suggested (and my sincere apologies if I’ve misinterpreted his statement – here or elsewhere) that this was in part because of our obsession with methods development.  I think this couldn’t be farther from the truth.  The methods that have become prevalent since the onset of the genomics era have taken microbial ecology a long way toward becoming a quantitative, process-driven field of study.  Big data, statistics, and modeling are all skills that the more traditionally quantitative sciences mastered a long time ago.  It’s time for microbial ecology to join the 21st century.

The value that these other fields place on observation can be seen in the way those communities invest in observational programs.  I doubt that physical oceanographers are lectured at their meetings about the need to reduce the highly successful Argo program, a global network of observational floats that doesn’t seek to address any particular hypothesis (though individual floats or sets of floats may be deployed for this purpose).  They understand that carefully and systematically collected and curated data is enormously valuable because it is the feedstock of future hypotheses and the means to test them.  Microbial ecology is decades behind the power curve here.  Astronomy (e.g. Kepler) and physics (e.g. Ice Cube) are additional examples.  The latter field is particularly interesting, as it has evolved to the point where it recognizes a need for two distinct subcultures; observationalists and theorists.

Point 3: Observations are difficult, so cut those that do it a break

The reason (I’m guessing) that physics makes a distinction between observationalists and theorists is that both of these subfields are difficult to master and require different skill sets.  I don’t think we’re at that point with microbial ecology, but it might be time for some recognition of how difficult it is to make meaningful observations.  During his talk Dr. Prosser lamented at how eager new graduate students are to begin their programs of study mastering new skills instead of developing interesting hypotheses.  I think that’s a deeply flawed line of thinking.  Yes, one should always be developing hypotheses as they go about their work, but if they never take the time to master technical skills they’ll never be in a position to do anything about it!  Molecular work at the lab bench, coding skills, and advanced statistics are essential tools in the field.  One should learn them as early as possible so that they know how to frame testable hypotheses and design realistic experiments!

Naturally, having spent the first 2 or 3 years of graduate school learning some lab techniques and developing proficiency with a programming language a graduate student would like to have something to show for their efforts.  I think that many of the “observational” papers that Dr. Prosser referred to reflect the work of junior scientists in this position or more senior scientists bold enough to try and master new methods (a particularly rare breed!).  It’s doing no harm to have these papers out there.  One will not get a postdoc, faculty position, or tenure on them alone, so what’s the issue?  At worst they pad the CV of a scientist who will go on to develop hypotheses and do more interesting work, at best they add to a critical institutional knowledge base from which the most interesting hypotheses will eventually come.

Point 4: Not all hypotheses are worth pursuing – and some observations are worth more than some hypotheses

As part of his (extensive) preparations for the talk Dr. Prosser went to the effort to catalog recent papers as purely observational or hypothesis testing, finding 62 % to be purely descriptive and 70 % producing, but not testing, a new hypothesis.  I’m not sure how big a problem this really is.  As mentioned before it takes time to make meaningful observations in microbial ecology.  At the end of it you might not have anything worth pursuing, or it may not be logistically feasible to pursue (a common problem in marine and polar systems, for example), but you’d still like a paper for your efforts.  I think the fundable, testable, interesting hypotheses are getting tested.  The less interesting and currently not testable ones are not, and that’s okay.  In the meantime however, and lacking any systematic sampling program (Argo, Kepler…), all these observations are slowly building a knowledge base.  Dr. Prosser has perhaps never used the genomes in Genbank to test a hypothesis.  I have.  I’m glad for all the purely observational studies that have produced the data I can use.

I had some other things to say, but I think I better end it here and go find some Korean BBQ instead.  As a parting thought, I think the field – to some degree split along generational lines – is afraid of data and observations.  People don’t know what to do with it and they don’t know how to handle it.  That’s beginning to change.  We need more observations now, not fewer.  But they need to be better observations, collected carefully and systematically, and curated and shared.  I’m deeply concerned that the continued grumbling about too many observations and too much methods development is ruining our ability to do this.

Time for BBQ.

Posted in Uncategorized | Leave a comment

The history of sea ice microbial ecology

Doctoral dissertations typically include an introduction, a Chapter 1 that summarizes the work and the motivation for undertaking it.  Last week I submitted my dissertation to the UW and the introduction, which includes citations from some long-ago work on psychrophiles and sea ice microbial ecology, might be of interest to some.  I’ve posted the introduction here in its entirety.  If you would like a pdf of any of the hard-to-find early works cited, please let me know and I’d be happy to provide.

Chapter 1: An introduction to sea ice microbial ecology

Our universe is a cold place, with a background temperature of only 2.73 degrees above absolute zero (de Bernardis et al., 2000). Yet scattered throughout this universe are pinpoints of warmth, the result of nuclear fusion, as in the case of our sun, and gravitational interactions, as in the case of the moon Europa). It is much too hot for life to exist near the energetic centers of these pinpoints, but around each is a thin veneer of conditions that enable liquid water, chemical interactions, and reaction rates familiar to those that permit life on Earth. If we consider each of these pinpoints as a sphere, we find that the habitable space around each increases exponentially with a linear decrease in the temperature minima of life (Fig. 1). Thus these spheres succinctly demonstrate that the lower the temperature that life can tolerate, the more space there is for habitation. With this in mind, the function of life at low temperature becomes a central question to the ecology of life in our universe and on Earth.

Fig. 1. The theoretical distribution of habitable space. Given a point source of heat, the amount of energy (here temperature is used as a proxy for energy) that reaches a point, distance d, from the source can be estimated from the inverse square law. The amount of space contained in a sphere with radius d is given by the volume of the sphere (v). If the temperature calculated for the distance d represents the lowest temperature permissible for life, the fraction of habitable space is v divided by the volume of a sphere defined by the maximum size of the system (e.g. the size of a solar system).

In seven chapters this dissertation explores the adaptation of microbial life on Earth to low temperature with a particular emphasis on sea ice environments. Chapter 1 reviews the current state of our understanding of sea ice microbial ecology. Chapter 2 compares the microbial community structure of Arctic multiyear sea ice to that of surface seawater — two proximate but ecologically disparate low temperature environments. Chapter 3 explores the microbial community in frost flowers, highly saline structures on the surface of newly formed sea ice, and the underlying newly formed or young sea ice. Chapter 4 uses metagenomics to probe the functional diversity within this same environment. Chapter 5 evaluates the role that horizontal gene transfer might play in microbial adaptation to low temperature environments through quantification of genomic plasticity in psychrophile genomes and taxonomically related mesophile genomes. This chapter also explores the rate of horizontal gene transfer among psychrophiles and mesophiles throughout the Phanerozoic Eon. Chapter 6 explores the adaptation of putatively cold active enzymes to low temperature and describes a simple model, the Protein Evolution Parameter Calculator (PEPC), to evaluate the difficulty of producing enzymes optimized to multiple environmental challenges. Chapter 7 explores the distribution of genes coding for alkane hydroxylase enzymes in psychrophile genomes and the degree to which these enzymes might be optimized to low temperatures. Three appendices accompany this dissertation as well. Appendix 1 describes the Cryosphere Frost Flower Reactor for Organic Geochemistry (CRYO-FFROG), an experimental apparatus for exploring photochemical reactions in the surface ice environment. Appendix 2 reports a strong correlation between bacterial abundance and salt in frost flowers and other components of the young sea ice environment, and it discusses the potential implications for microbial transport and chemical reactions at the sea ice surface. Appendix 3 explores the potential microbial exchange between two geographically separate ice environments linked by aerial transport: the supraglacial environment and the young sea ice environment.

1.1 A brief history of research on sea ice microbial communities

Cold-active microbes first appeared in the scientific literature as early as 1887, when Forster isolated bioluminescent bacteria capable of growth at 0 °C from cold-stored flounder (Forster, 1887). Likewise, the scientific exploration of sea ice microbial communities dates back to at least that same year, when Nansen described diatoms that adhered to the bottom of Arctic sea ice (see reference in Nansen, 1906). McLlan extended these observations during the Australian Antarctic Expedition between 1911 and 1914, describing “practically the whole of low life which exists” in Antarctic sea ice— including protists, rotifers, and bacteria — and in fact, was able to culture bacteria from preparations of sea ice algae (McLlan, 1918). Despite these tantalizing early observations, the study of life in low temperature environments proceeded slowly until after World War II, when increased exploration of the Arctic and Antarctic provided new opportunities to study cold microbial habitats. Prior to 1960, this work was motivated primarily by the role bacteria played in food spoilage and focused on a relatively narrow range of isolates (e.g. Pseudomonas spp.). Despite these limitations, important advances in our overall understanding of psychrophiles were made during this period; these included adoption of the reaction rate terminologies Q10 and µ to describe the temperature dependence of enzymes (Ingraham & Bailey, 1959) and an overall appreciation that psychrophilic microbes are ecologically distinct from mesophiles (Ingraham, 1958).

The International Geophysical Year (IGY) of 1957-1958 heralded an era of increased scientific activity in the Arctic and Antarctic. While we cannot link any microbiological studies directly to IGY, it serves as a useful boundary between studies that preceded it, which were primarily limited to laboratory work, and those that followed it, which would be augmented by environmental observations. The first focused study of sea ice bacteria proceeded from the fifth Japanese Research Expedition of 1961. Researchers obtained several isolates from surface seawater and from melted sea ice samples stored at -5 °C for several months (Iizuka, Tanabe, & Meguro, 1966). Isolation of these samples, however, took place at 25 °C, a temperature that is deleterious to most sea ice bacteria, thus the dominant members of the community were missed. The authors however, did note a co-occurrence of bacteria and ice algae in the sampled “plankton ice” and speculated on possible ecological interactions, a re-occurring theme in sea ice microbial ecology (e.g. Sullivan & Palmisano, 1984, Feng et al., 2014). While the isolation and characterization of marine psychrophiles from seawater continued throughout the 1960’s (e.g. Colwell et al., 1964), studies of sea ice microbial ecology focused almost exclusively on ice algae. The most visible component of the sea ice microbial ecosystem, ice algae can reach densities in excess of 670 µg chlorophyll a L-1 in spring and summer (H Meguro, 1962). Descriptive work of this element of the ecosystem (Bunt, 1963b; Iizuka et al., 1966) rapidly gave way to more quantitative analyses (Apollonio, 1965; Bunt, 1963a; Burkholder & Mandelli, 1965; Horner & Alexander, 1972; Hiroshi Meguro, Ito, & Fukushima, 1967) regarding chlorophyll concentrations, primary production, and the specific challenges of life in ice. These studies clarified that ice algae are wide-spread, important to the polar carbon cycle, and uniquely adapted to the sea ice environment.

During the 1970’s and early 1980’s, the works of Pomeroy (1974), Azam et al. (1983), and others brought forward the role that heterotrophic bacteria play in the marine carbon cycle. The “microbial loop,” wherein dissolved organic carbon (DOC) is recycled via bacterial assimilation and predation by protozoa, became recognized as an important component of the marine food web. Sullivan and a host of co-authors transferred this concept to the sea ice ecosystem in a pivotal series of papers in the 1980s (Grossi, Kottmeier, & Sullivan, 1984; Kottmeier, Grossi, & Sullivan, 1987; Kottmeier & Sullivan, 1988; Sullivan & Palmisano, 1984). Their work confirmed that sea ice bacteria are not only abundant and active within sea ice but also closely coupled to the occurrence of ice algae. These observations, all on mature, land-fast ice within McMurdo Sound, Antarctica, were extended to more variable ice types by Helmke and Weyland (1994) and Grossmann and Diekmann (1994). Working on newly formed pelagic sea ice, Grossmann and Diekmann (1994) observed significant bacterial growth rates even in relatively oligotrophic sea ice as well as bacterial production rates far in excess of those observed in seawater. Helmke and Weyland (1994) extended these observations to pelagic winter sea ice via the observation of high rates of activity and bacterial biomass relative to the underlying water column; at times, the ATP concentration, a measure of metabolic activity, in a single meter of sea ice exceeded the 100 m depth-integrated value for the underlying water column. These high levels of activity, however, were limited to the bottom-most, warmest zone of sea ice.

By the mid-1990s, it was clear that sea ice bacterial communities were composed of physiologically unique, psychrophilic bacteria capable of survival under conditions of severe environmental stress and had developed a fast response to new inputs of carbon. The taxonomic and functional diversity of this community, however, was almost entirely unknown, with the exception of the phenotype and morphology-based classifications of a few isolates for the former (Iizuka et al., 1966) and the limited observations of extracellular enzyme activity for the latter (Helmke & Weyland, 1994). Concurrent with the growing appreciation of sea ice bacteria as a unique and potentially important component of the polar marine ecosystem came major advances in understanding taxonomic diversity within microbial communities. In a groundbreaking 1977 paper, Woese and Fox used 16S and 18S rRNA gene sequences to classify life into three broad domains (Woese & Fox, 1977). Improvements in sequence technology nearly a decade later (Smith et al., 1986) opened the door for more wide-spread sequencing of 16S rRNA genes from environmental samples and a rapid shift in the existing paradigms of microbial diversity (e.g. Giovannoni, et al., 1990, Ward, et al., 1990). These methods were further utilized to identify isolates from sea ice (John P Bowman et al., 1998; J. P. Bowman, McCammon, Brown, Nichols, & McMeekin, 1997) and, in a novel application of the technique, to identify sequences from an environmental clone library (Brown & Bowman, 2001). These and later studies established that while most genera observed in sea ice have members common to other environments, there are specific strains associated with sea ice.

Studies during this time also introduced astrobiology as a new motivation for studying sea ice microbial communities (Deming & Huston, 2000) (Fig. 2).  Broadly defined as the study of life in a universal context, astrobiology could be considered the purest form of ecology.  A common approach is to study Earth environments that are analogous to potential extraterrestrial habitats.  Since, as discussed previously, the universe is a cold place, the study of sea ice, glacial ice, and permafrost are central, but certainly not exclusive of, this approach.

Reports of the first extra-solar planet orbiting a main sequence star in 1995 (Mayor & Queloz, 1995) and the thousands of candidate and confirmed extrasolar planets since have catalyzed the research on Earth analogues, as has the now hotly disputed  putative microbial fossils in the Martian meteor ALH84001 in 1996 (McKay et al., 1996) and the now widely accepted liquid-water ocean beneath Europa’s icy exterior in 1998 (Carr et al., 1998).

Fig. 2. Occurrence in the peer-reviewed literature of the word “psychrophile” and “psychrophile+astrobiology.” Data was taken from Google Scholar searches for 10 year intervals starting with 1880. The final bin represents the period 2010 to 2014. Patents and citations were excluded from the search.

Although the studies of the 1990s began to elucidate the composition of the sea ice microbial community, they did not address its functional role.  In some cases, function could be inferred from specific experiments; Gerdes et al. (Gerdes, Brinkmeyer, Dieckmann, & Helmke, 2005) and Brakstad et al. (Brakstad, Nonstad, Faksness, & Brandvik, 2008) used diesel and crude oil perturbation experiments to explore the ability of the sea ice microbial community to respond to these carbon sources.  Analyses of specific functional genes within sea ice, however, have been surprisingly limited even though they are the most high-throughput measure of community metabolic capability.  Koh et al. identified proteorhodopsin genes (Koh et al., 2010) and genes for anoxygenic photosynthesis (Koh, Phua, & Ryan, 2011) within Antarctic sea ice, which suggests that bacterial energy acquisition in sea ice is not limited to the oxidation of ice algal photosynthates, and Møller et al. (2011) found mercury resistance genes in Arctic sea ice brines.  These insights into sea ice microbial community function have been extended by a small but growing number of sequenced genomes from sea ice isolates.  Although inferences of function from genomes are restricted to the geography and ecology of the original isolate, commonalities between isolates can provide a broader picture of adaptation to the sea ice environment.

Colwellia psychrerythraea 34H was the first sequenced sea ice bacterium (Methe et al., 2005). Although this strain was isolated from sediment, 16S rRNA gene sequences associated with the genus Colwellia have also been observed in sea ice (John P Bowman et al., 1998; J. P. Bowman et al., 1997; K. Junge, Imhoff, Staley, & Deming, 2002), although it is not ubiquitous in the sea ice environment. Other sequenced genomes from sea ice bacteria include Glaciecola pyschrophila 170 (Yin et al., 2013), Octadecabacter arcticus 238 and Octadecabacter antarcticus 307 (Vollmers et al., 2013), and Pyschroflexus torquis ATCC 700755 (Feng, Powell, Wilson, & Bowman, 2014). Considering that in July of 2014 there were nearly 12,000 completed and draft genomes in Genbank, it is clear that the genetic diversity of sea ice has been under sampled compared to some other environments. The tiny glimpse into the function of the sea ice microbial community afforded by these genome sequences suggest that the community may be not only exceptionally genetically and physiologically plastic, but also highly adapted to the environmental constraints imposed by sea ice.

1.2 The present-day understanding of sea ice microbial ecology

Like many microbial habitats, ice is a porous media with a solid phase composed of ice crystals and a liquid phase composed of water and solutes excluded from the crystals during growth. Aside from temperature alone, water ice is distinct from other porous media due to its dynamic nature over a temperature range that is relevant to microbial life. While tap water begins to freeze at 0 °C, seawater, with a salinity of roughly 35 ppt, begins to freeze at -1.8 °C and does not complete the process until -36 °C (G. Marion, Farren, & Komrowski, 1999). The higher the salinity of the starting solution, the lower the temperature required to initiate freezing. Once freezing is initiated for a solution with the brine composition of seawater, however, the salinity of the interstitial brines is almost entirely a function of temperature, although organic matter content and other factors do have some affect. At -5 °C the brine salinity of sea ice is approximately 87 ppt; at -20 °C it will have reached 209 ppt.

Fig. 3. Factor-fold concentration of solutes in water ice relative to starting concentration. The relative solute concentration is the inverse of the brine volume fraction, calculated along a gradient of temperature and salinity from the equations of Cox and Weeks (1983).

The concentration of solutes by freezing is most evident with salt but applies to other components of seawater as well. The lower the salinity and the colder the temperature of the starting solution, the more concentrated the solutes are in the brine phase relative to their starting concentrations. This extends even to bacteria and virus particles as demonstrated by Junge et al. (Karen Junge, Krembs, Deming, Stierle, & Eicken, 2001) and Wells and Deming (Wells & Deming, 2006). For any solute, the degree to which it is concentrated in ice relative to its concentration in the source material is a function of temperature and the bulk salinity of the ice, the total quantity of salt contained in a volume of melted ice (Cox & Weeks, 1983). Thus the degree of concentration varies widely between ice types (Fig. 3).

Because primary production is most concentrated at the sea ice-seawater interface and because DOC, as with other solutes, is concentrated in sea ice brines (Giannelli et al., 2001), bacterial sea ice specialists are optimized to high concentrations of organic carbon. This may aid growth at the coldest temperatures and allow enzymes to maintain uptake rates sufficient to support growth at temperature well below their optimum if substrate concentrations are high (Nedwell, 1999; Pomeroy & Wiebe, 2001). Likewise, this discovery has aided the laboratory study of sea ice bacteria, as sea ice specialists tend to also be specialists for common organic-rich culture conditions (K. Junge et al., 2002).

Observations of sea ice microbial community structure during the winter suggest that, while metabolic activity is present (Karen Junge, Eicken, & Deming, 2004), bacterial growth is extremely limited. Observing Arctic sea ice throughout the winter, Collins et al. (2010) reported little change to microbial community structure, while that of the underlying seawater changed considerably in the same time period. The authors hypothesized that photoplankton-produced exopolymers (EPS) and bacteria known to act as cryoprotectants (Krembs, Eicken, Junge, & Deming, 2002) may enable the survival of even non-ice associated genera within sea ice. The over-wintering community observed by Collins et al. is similar to a typical seawater community and distinct from the community observed in late-spring and summer sea ice. To date, the transition from winter to summer has not been observed with molecular methods, a surprising deficiency given the relevance of this transition to the polar carbon cycle. By tracking bacterial abundance and chlorophyll a concentrations, the sea ice microbial community can be seen rapidly responding to the initiation of the spring algal bloom, though the response may be slower than that observed for seawater (Fig. 4). This response is presumed to reflect the rapid growth of the psychrophilic sea ice microbial community as soon as DOC concentrations are sufficient to overcome the temperature inhibition of enzymes.

Fig. 4. Bacterial abundance and chlorophyll a from land-fast first-year sea ice during the Austral spring of 2011, McMurdo Sound, Antarctica. Grey area indicates seawater below the advancing ice front. Actual values for seawater are given in the box below each primary frame.

1.3 The future of sea ice microbial ecology

The over-arching research objectives in microbial ecology follow the classic set of questions: who, what, where, when, why, and how. When these questions are known in sufficient detail, it is possible to make predictions about the ecosystem in question. A goal of sea ice microbial ecology, for example, is to predict how the microbial ecosystem will respond to environmental perturbations, including slow perturbations, like changing climate, or fast perturbations, such as a release of crude oil. The latter issue has particular significance in the Arctic, where exploration and extraction continues on significant marine petroleum reserves (Gautier et al., 2009). The Macondo Well disaster in the Gulf of Mexico demonstrated that indigenous deep sea bacteria can degrade a considerable quantity of crude oil despite high pressure and low temperatures (Redmond & Valentine, 2012). Interestingly, one of the bacteria observed responding to this input of crude oil was Colwellia sp., a genus known to sea ice as discussed above. Colwellia psychrerythraea 34H is one of the few sequenced sea ice associated bacteria and is commonly used in laboratory studies of cold adaptation. Neither C. psychrerythraea 34H nor a close relative sequenced from water in the vicinity of the Macondo wellhead (Mason, Han, Woyke, & Jansson, 2014), however, have recognizable genes for catabolizing the low molecular weight alkanes that the latter is implicated in. One explanation is that this bacterium is using a gene without close homology to known alkane degradation genes, an exciting and realistic possibility, or that it is responding to secondary metabolites produced by Oceanospiralles, the primary responder.

Either of these scenarios has important implications for the bioremediation of oil released in the proximity of sea ice. Because sea ice bacteria are optimized to high carbon concentrations, they may be resistant to crude oil toxicity and capable of more rapid bioremediation. This idea is supported by the work of Gerdes et al. (Gerdes et al., 2005) and Brakstad et al. (Brakstad et al., 2008), who observed that the indigenous sea ice microbial community is capable of crude oil degradation. The potential rate of crude oil catabolism under the environmental conditions imposed by sea ice, however, remains an unknown. At low temperatures, many crude oil components have reduced bioavailability (Colwell, Walker, & Cooney, 1977), and in sea ice, the rate of bacterial production is generally below the rate of primary production (Pomeroy & Wiebe, 2001). This indicates the limit to which increased substrate concentration can make up for reduced substrate affinity. While sea ice bacteria produce enzymes optimized for low temperatures (Adrienne L. Huston, Krieger-Brockett, & Deming, 2000; Adrienne L Huston, Methe, & Deming, 2004; Methe et al., 2005), this optimization may be insufficient to keep pace with either carbon fixation or a rapid input of crude oil. Predicting the fate of crude oil or any other perturbation to the sea ice ecosystem, will require a much more complete understanding of microbial functional diversity and physiology and their impact on biogeochemistry.

One pathway to develop a predictive biogeochemical model is the metabolic flux model, an idea that was explored for crude oil degradation in a review by Rӧling and Bodegom (Röling & van Bodegom, 2014). This model makes use of community metabolic potential and gene expression data, information derived from environmental and isolate sequencing experiments, to predict the flow of energy and material between the biotic and abiotic components of the biosphere as well as between members of the microbial community. Coupled to a traditional biogeochemical model, the metabolic model becomes predictive at the ecosystem level; as the flow of energy, carbon, or nutrients into the system change, or as members of the community change in presence or abundance, the impact on biogeochemistry can be quantified. This idea is conceptualized in the Taxonomy-Metabolic potential-Metabolism-Biogeochemistry (TMMB) model framework (Fig. 5). As in any model, however, the predictive value of the TMMB framework depends on the level of detail built into the model itself.

Fig. 5. The TMMB model framework. The pyramid represents a conceptual framework linking community composition (taxonomy), easily monitored in the environment and a direct function of environmental conditions, with metabolic potential, metabolism, and biogeochemistry. The dynamic nature of the system is commonly referred to as “plasticity;” four different types of plasticity are shown in red. The specific analytical techniques relevant to the development of a model are shown in blue. Given adequate knowledge of M and M from experiments and environmental observations, it should be possible to predict B from T via an analysis of predicted metabolisms, and T from B via a predicted microbial community response.

Despite decades of research on sea ice microbial ecology, our grasp of the details of sea ice ecosystem function trails far behind our grasp for the marine microbial ecosystem in general. As outlined in the section: A brief history of research on sea ice microbial communities, our understanding of sea ice microbial ecology has generally lagged the broader field of marine microbial ecology by a decade or more. The first measurement of primary production by the 14C method, for example, came in 1952 for seawater (Steeman-Nielsen, 1952) and 1965 for sea ice (Burkholder & Mandelli, 1965). The first study of in situ gene expression in seawater came in 1990, while gene transcripts were not studied in sea ice until 2010 (Koh et al., 2010). Reasons for this delay may include the novelty of the environment, which reduces the need for new or sophisticated techniques to warrant funding or publication, technical challenges regarding the application of these techniques to sea ice, the limited number of researchers in the field, and the significant logistical challenge of accessing the sea ice environment and conducting research there. While these are valid reasons, an alternate paradigm for the future may be to view the challenges of sea ice microbial ecology as a strong motivator for greater innovation in research. The field of sea ice microbial ecology is well placed to lead the development of a new framework for understanding microbial ecology.

This preferential placement is because, compared to many other microbial ecosystems, the sea ice microbial ecosystem is relatively simple. A large number of its dominant members can be cultured (K. Junge et al., 2002) and thus sequenced and subjected to detailed physiological evaluation. Due to the static nature of sea ice, the flux of nutrients and materials into the system is easier to constrain than for many other marine environments. A strong seasonality not only defines the sea ice environment, but also provides a predictable annual cycle of perturbation and community succession that makes it ideal to test hypothesis regarding the biogeochemical impact of community structure. Sea ice represents an optimal environment to develop and test integrated models connecting community structure, metabolic potential, biological activity, biogeochemistry, and the resulting feedback loop on community structure. Such an undertaking, however, will require a coordinated and long-term research effort involving both modelers and observationalists, with both groups including specialists in physiology, genetics, and biogeochemistry.

While this dissertation does not solve the problem of implementing a TMMB model for the sea ice microbial ecosystem, it instead seeks to clarify further details of the sea ice microbial ecology through a better understanding of microbial community structure within several under-explored ice types, the genomic plasticity and metabolic function of psychrophiles, and the evolution of cold-adapted communities. In time, a deeper appreciation of these aspects of the microbial community will become part of the foundation for a more complete understanding of the sea ice ecosystem as a whole.

References

Apollonio, S. (1965). Chlorophyll in arctic sea ice. Arctic, 118-122.

Azam, F., Fenchel, T., Field, J. G., Gray, J. S., Meyer-Reil, L. A., & Thingstad, F. (1983). The ecological role of water-column microbes in the sea. Marine Ecology Progress Series, 10(3), 257-263.

Bowman, J. P., Gosnik, J. K., McCammon, S. A., Lewis, T. E., Nichols, D. S., Nichols, P. D., . . . Staley, J. T. (1998). Colwellia demingiae sp. nov., Colwellia hornerae sp. nov., Colwellia rossensis sp. nov. and Colwellia psychrotropica sp. nov.: psychrophilic Antarctic species with the ability to synthesize docosahexaenoic acid (22: ω63). International journal of systematic bacteriology, 48(4), 1171-1180.

Bowman, J. P., McCammon, S. A., Brown, M. V., Nichols, D. S., & McMeekin, T. A. (1997). Diversity and association of psychrophilic bacteria in Antarctic sea ice. Appl. Environ. Microbiol., 63, 3068-3078.

Brakstad, O., Nonstad, I., Faksness, L.-G., & Brandvik, P. (2008). Responses of microbial communities in Arctic Sea Ice after contamination by crude petroleum oil. Microb. Ecol., 55(3), 540-552. doi: 10.1007/s00248-007-9299-x

Brown, M. V., & Bowman, J. P. (2001). A molecular phylogenetic survey of sea-ice microbial communities. FEMS Microbiol. Ecol., 35, 267-275.

Bunt, J. (1963a). Microbiology of Antarctic Sea-ice: Diatoms of Antarctic Sea-ice as Agents of Primary Production.

Bunt, J. (1963b). Microbiology of Antarctic Sea-ice: Microalgae and Antarctic Sea-ice. Nature, 199, 1254-1255.

Burkholder, P. R., & Mandelli, E. F. (1965). Productivity of microalgae in Antarctic sea ice. Science, 149(3686), 872-874.

Carr, M. H., Belton, M. J. S., Chapman, C. R., Davies, M. E., Geissler, P., Greenberg, R., . . . Veverka, J. (1998). Evidence for a subsurface ocean on Europa. Nature, 391(6665), 363-365.

Collins, R. E., Rocap, G., & Deming, J. W. (2010). Persistence of bacterial and archaeal communities in sea ice through an Arctic winter. Environ. Microbiol., 12(7), 1828-1841.

Colwell, R. R., & Morita, R. Y. (1964). Reisolation and emendation of description of Vibrio marinus (Russell) Ford. Journal of Bacteriology, 88(4), 831-837.

Colwell, R. R., Walker, J. D., & Cooney, J. J. (1977). Ecological aspects of microbial degradation of petroleum in the marine environment. Critical Reviews in Microbiology, 5(4), 423-445.

Cox, G. F. N., & Weeks, W. F. (1983). Equations for determining the gas and brine volumes in sea-ice samples. J. Glaciol., 29(102), 306-316.

de Bernardis, P., Ade, P., Bock, J., Bond, J., Borrill, J., Boscaleri, A., . . . Farese, P. (2000). A flat Universe from high-resolution maps of the cosmic microwave background radiation. Nature, 404(6781), 955-959.

Deming, J., & Huston, A. (2000). An oceanographic perspective on microbial life at low temperatures with implications for polar ecology, biotechnology and astrobiology Cellular Origins and Life in Extreme Habitats (pp. 149-160).

Feng, S., Powell, S. M., Wilson, R., & Bowman, J. P. (2014). Extensive gene acquisition in the extremely psychrophilic bacterial species Psychroflexus torquis and the link to sea-ice ecosystem specialism. Gen biol evol, 6(1), 133-148.

Forster, J. (1887). Über einige eigenschaften leuchtender bakterien. Centr. Bakteriol. Parasitenk, 2, 337-340.

Gautier, D. L., Bird, K. J., Charpentier, R. R., Grantz, A., Houseknecht, D. W., Klett, T. R., . . . Wandrey, C. J. (2009). Assessment of undiscovered oil and gas in the Arctic. Science, 324(5931), 1175-1179. doi: 10.1126/science.1169467

Gerdes, B., Brinkmeyer, R., Dieckmann, G., & Helmke, E. (2005). Influence of crude oil on changes of bacterial communities in Arctic sea-ice. FEMS Microb. Ecol., 53(1), 129-139. doi: 10.1016/j.femsec.2004.11.010

Giannelli, V., Thomas, D. N., Haas, C., Kattner, G., Kennedy, H., & Dieckmann, G. S. (2001). Behaviour of dissolved organic matter and inorganic nutrients during experimental sea-ice formation. Annals of Glaciology, 33(1), 317-321.

Giovannoni, S. J., Britschgi, T. B., Moyer, C. L., & Field, K. G. (1990). Genetic diversity in Sargasso Sea bacterioplankton. Nature, 345, 60-63.

Grossi, S. M., Kottmeier, S. T., & Sullivan, C. (1984). Sea ice microbial communities. III. Seasonal abundance of microalgae and associated bacteria, McMurdo Sound, Antarctica. Microbial Ecology, 10(3), 231-242.

Grossmann, S., & Dieckmann, G. S. (1994). Bacterial standing stock, activity, and carbon production during formation and growth of sea ice in the Weddell Sea, Antarctica. Appl. Environ. Microbiol., 60(8), 2746-2753.

Helmke, E., & Weyland, H. (1994). Bacteria in sea ice and underlying water of the eastern Weddell Sea. Mar. Ecol. Prog. Ser., 117, 269-287.

Horner, R., & Alexander, V. (1972). Algal populations in arctic sea ice: An investigation of heterotrophy. Limnol. Oceanogr, 17(3), 454-458.

Huston, A. L., Krieger-Brockett, B. B., & Deming, J. W. (2000). Remarkably low temperature optima for extracellular enzyme activity from Arctic bacteria and sea ice. Environ. Microbiol., 2(4), 383-388. doi: 10.1046/j.1462-2920.2000.00118.x

Huston, A. L., Methe, B., & Deming, J. W. (2004). Purification, characterization, and sequencing of an extracellular cold-active aminopeptidase produced by marine psychrophile Colwellia psychrerythraea strain 34H. Applied and Environmental Microbiology, 70(6), 3321-3328.

Iizuka, H., Tanabe, I., & Meguro, H. (1966). Microorganisms in plankton-ice of the Antarctic Ocean. The Journal of General and Applied Microbiology, 12(1), 101-102. doi: 10.2323/jgam.12.101

Ingraham, J. (1958). Growth of psychrophilic bacteria. Journal of Bacteriology, 76(1), 75.

Ingraham, J., & Bailey, G. (1959). Comparative study of effect of temperature on metabolism of psychrophilic and mesophilic bacteria. Journal of Bacteriology, 77(5), 609.

Junge, K., Eicken, H., & Deming, J. W. (2004). Bacterial activity at –2 to –20°C in Arctic wintertime sea ice. Appl. Environ. Microbiol., 70(1), 550−557. doi: 10.1128/aem.70.1.550-557.2004

Junge, K., Imhoff, J., Staley, J., & Deming, J. (2002). Phylogenetic diversity of numerically important Arctic sea-ice bacteria cultured at subzero temperature. Microb. Ecol., 43(3), 315-328.

Junge, K., Krembs, C., Deming, J., Stierle, A., & Eicken, H. (2001). A microscopic approach to investigate bacteria under in situ conditions in sea-ice samples. Ann. of Glaciol., 33, 304−310.

Koh, E. Y., Atamna-Ismaeel, N., Martin, A., Cowie, R. O. M., Beja, O., Davy, S. K., . . . Ryan, K. G. (2010). Proteorhodopsin-bearing bacteria in Antarctic sea ice. Appl. Environ. Microbiol., 76(17), 5918-5925. doi: 10.1128/aem.00562-10

Koh, E. Y., Phua, W., & Ryan, K. G. (2011). Aerobic anoxygenic phototrophic bacteria in Antarctic sea ice and seawater. Environmental Microbiology Reports, 3(6), 710-716. doi: 10.1111/j.1758-2229.2011.00286.x

Kottmeier, S. T., Grossi, S., & Sullivan, C. W. (1987). Sea ice microbial communities. VIII. Bacterial production in annual sea ice of McMurdo Sound, Antarctica. Mar Ecol Prog Ser, 35, 175-186.

Kottmeier, S. T., & Sullivan, C. W. (1988). Sea ice microbial communities (SIMCO). Polar Biology, 8(4), 293-304.

Krembs, C., Eicken, H., Junge, K., & Deming, J. W. (2002). High concentrations of exopolymeric substances in Arctic winter sea ice: implications for the polar ocean carbon cycle and cryoprotection of diatoms. Deep Sea Res. Part I, 49(12), 2163−2181.

Marion, G., Farren, R., & Komrowski, A. (1999). Alternative pathways for seawater freezing. Cold Regions Science and Technology, 29(3), 259-266.

Marion, G. M., & Farren, R. E. (1999). Mineral solubilities in the Na-K-Mg-Ca-Cl-SO4-H2O system: a re-evaluation of the sulfate chemistry in the Spencer-Møller-Weare model. Geochimica et Cosmochimica Acta, 63(9), 1305-1318.

Mason, O., Han, J., Woyke, T., & Jansson, J. (2014). Single-cell genomics reveals features of a Colwellia species that was dominant during the Deepwater Horizon oil spill. Name: Frontiers in Microbiology, 5, 332.

Mayor, M., & Queloz, D. (1995). A Jupiter-mass companion to a solar-type star. [10.1038/378355a0]. Nature, 378(6555), 355-359.

McKay, D. S., Gibson, E. K., Thomas-Keprta, K. L., Vali, H., Romanek, C. S., Clemett, S. J., . . . Zare, R. N. (1996). Search for Past Life on Mars: Possible Relic Biogenic Activity in Martian Meteorite ALH84001. Science, 273(5277), 924-930. doi: 10.1126/science.273.5277.924

McLlan, A. L. (1918). Bacteria of ice and snow in Antarctica. Nature, 102(2550), 35-39.

Meguro, H. (1962). Plankton ice in the Antarctic Ocean. Antarct Rec, 14, 1192-1199.

Meguro, H., Ito, K., & Fukushima, H. (1967). Ice flora (bottom type): a mechanism of primary production in polar seas and the growth of diatoms in sea ice. Arctic, 114-133.

Methe, B. A., Nelson, K. E., Deming, J. W., Momen, B., Melamud, E., Zhang, X., . . . Fraser, C. M. (2005). The psychrophilic lifestyle as revealed by the genome sequence of Colwellia psychrerythraea 34H through genomic and proteomic analyses. PNAS, 102(31), 10913-10918. doi: 10.1073/pnas.0504766102

Møller, A., Barkay, T., Hansen, M., Norman, A., Hansen, L., Sørensen, S., . . . Kroer, N. (2011). Mercuric Reductase Genes (merA) and Mercury Resistance Plasmids in High Arctic Snow, Freshwater and Sea-ice Brine. FEMS Microb. Ecol.

Nansen, F. (1906). Protozoa on the ice floes of the North Polar Sea. Scient. Results Norw. N. Polar Exped., 5(16), 1-22.

Nedwell, D. B. (1999). Effect of low temperature on microbial growth: lowered affinity for substrates limits growth at low temperature. FEMS Microbiol Ecol, 30(2), 101-111. doi: 10.1111/j.1574-6941.1999.tb00639.x

Pomeroy, L. R. (1974). The ocean’s food web, a changing paradigm. BioScience, 499-504.

Pomeroy, L. R., & Wiebe, W. J. (2001). Temperature and substrates as interactive limiting factors for marine heterotrophic bacteria. Aquatic Microbial Ecology, 23(2), 187-204.

Redmond, M. C., & Valentine, D. L. (2012). Natural gas and temperature structured a microbial community response to the Deepwater Horizon oil spill. PNAS, 109(50), 20292-20297.

Röling, W. F., & van Bodegom, P. M. (2014). Toward quantitative understanding on microbial community structure and functioning: a modeling-centered approach using degradation of marine oil spills as example. Frontiers in Microbiology, 5.

Smith, L. M., Sanders, J. Z., Kaiser, R. J., Hughes, P., Dodd, C., Connell, C. R., . . . Hood, L. E. (1986). Fluorescence detection in automated DNA sequence analysis. Nature, 321, 674-679.

Steeman-Nielsen, E. (1952). The use of radioactive carbon (C14) for measuring organic production in the sea. J. Conseil, 18(2), 117-140.

Sullivan, C. W., & Palmisano, A. C. (1984). Sea Ice Microbial Communities: Distribution, Abundance, and Diversity of Ice Bacteria in McMurdo Sound, Antarctica, in 1980. Appl. Environ. Microbiol., 47(4), 788-795.

Vollmers, J., Voget, S., Dietrich, S., Gollnow, K., Smits, M., Meyer, K., . . . Daniel, R. (2013). Poles apart: arctic and antarctic Octadecabacter strains share high genome plasticity and a New type of xanthorhodopsin. PLoS ONE, 8(5), e63422.

Ward, D. M., Weller, R., & Bateson, M. M. (1990). 16S rRNA sequences reveal numerous uncultured microorganisms in a natural community. Nature, 345, 63-65.

Wells, L. E., & Deming, J. W. (2006). Modelled and measured dynamics of viruses in Arctic winter sea-ice brines. Environ. Microbiol., 8(6), 1115−1121.

Woese, C. R., & Fox, G. E. (1977). Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proceedings of the National Academy of Sciences, 74(11), 5088-5090.

Yin, J., Chen, J., Liu, G., Yu, Y., Song, L., Wang, X., & Qu, X. (2013). Complete genome sequence of Glaciecola psychrophila strain 170T. Genome announcements, 1(3), e00199-00113.

 

 

Posted in Uncategorized | Leave a comment

Clustering metagenomic sequence reads

Another interesting paper caught my eye last week, Nielsen et al. in Nature Biotechnology; Identification and assembly of genomes and genetic elements in complex metagenomic samples without using a reference genome. First, a complaint: 53 authors, really?  There are more offensive papers out there in this regard, but come on guys, not everyone in the research group needs be listed.  I could be totally wrong on this, but it seems unlikely that everyone on the list touched the data or contributed in a way that justifies authorship.  If I’m wrong, my sincere apologies, and more power to them!  If I’m not wrong then it would be nice of the top tier journals, where author bloat is most prevalent, to start policing this a little more aggressively.

Moving beyond that I’ve got warm feelings toward this particular (if very large) list of authors, as Nik Blom and others got me started in biological sequence analysis with a lab rotation at DTU back in 2010 (and are co-authors on one of the papers that came from my dissertation work).  The Nielsen et al. paper tackles one of the more vexing and important questions regarding biological sequence analysis: Given a metagenome, how can the reads be clustered by genome?  Solving this problem allows the metagenome to be more than a tool for evaluating community metabolic potential, as large genomic fragments can be used to probe deeper questions of microbial evolution, diversity, and function.

The problem addressed by the paper isn’t novel; numerous previous papers have reported genomes assembled from metagenome.  All of these used some read-clustering method to group like-reads or like-contigs prior to assembly.  A nice example of this is given by Iverson et al., 2012.  They used coverage statistics, GC content, and other metrics to bin contigs, and ultimately tetranucleotide information (4-base kmer abundance) to group like-scaffolds into a (nearly) complete genome of an uncultured group II Euryarchaeota.  There are now a number of papers out that use a philosophically similar approach, often relying on an emerging self-organizing map (ESOM) to perform the actual clustering.  This approach seems to work reasonably well, but it is understood that reads are binned at very low resolution with each cluster containing bits from vaguely similar genomes.  There is no ability to distinguish between ecotypes or related strains.

Nielsen et al. improved upon these methods by clustering genes based on abundance, they call the resulting clusters CAGs.  Applying this method to 396 human gut metagenomes, they assembled 238 unique genomes.  If the method survives the scrutiny of the community it represents a big step forward in throughput.  Here’s what they did in a little more detail.

The authors started by conducting a de novo assembly of the metagenomes into open reading frames (ORFs).  Because ORFs are short this is a relatively simple task.  Picking an ORF at random, they then searched the dataset for ORFs with a similar abundance across all samples (having a large number of samples is essential for this approach).  ORFs with a similar abundance profile were considered to be from the same genetic element and were called canopies.  Canopies of very rare or poorly distributed ORFs were rejected, the remaining canopies were considered to represent genomes and plasmids, as illustrated by the bimodal size distribution in their Fig. 2a:

fig_2a

 

 

Taking these canopies as probable genetic entities, the authors then mapped the original sequence reads to each canopy and assembled each pool of reads.  Overall a pretty slick method, assuming one has 396 somewhat-similar samples to analyze!  I’m curious to know whether a single deeply sequenced sample could be randomly partitioned into virtual samples for a similar analysis.  This would make the method available to us mere mortals without 396 metagenomes at our disposal…

 

 

 

Posted in Research | Leave a comment

Great modeling paper published

A very nice paper in the ISME Journal came across my Google Scholar alerts this week – Satellite remote sensing data can be used to model marine microbial metabolite turnover, by Larsen et al.  The author list includes some heavy hitters in the field of microbial ecology, including Rob Knight and Jack Gilbert.  The paper is significant for being, as far as I’m aware, the first study to quantitatively link microbial taxonomy, metabolic potential, and environmental parameters in a predictive manner.  This is something of a holy grail for microbial ecologists because, while microbial taxonomy and metabolic potential are difficult to measure, and can only be measured at discrete times and places in a study region, some environmental parameters (chlorophyll, sea surface temperature, etc.) are easily and near-continuously measured by satellite across a broad study region.  Robustly correlating to these easily-observed parameters allows the prediction of microbial community composition and the resulting metabolome (pool of material originating from the microbial community) from some future set of environmental parameters.  Here’s a quick summary of what they did.

Focusing on the Western English Channel, an ecologically very well characterized site, the authors developed spatially contiguous data for dissolved oxygen, phosphate, nitrate, ammonium, silicate, chlorophyll A, photosynthetically active radiation, particulate organic carbon in small, medium, large, and semi-labile classes, and bacterial abundance.  Most of this data is not observable via satellite, but all of it is much easier to collect than it is to determine microbial community structure or metabolic potential directly.  It’s my understanding that the authors first correlated the parameters that could not be measured by satellite to those that could, and then used these models to construct the contiguous datasets.  They then use a microbial assemblage prediction (MAP) model (see Larsen et al., 2012) to, well, predict the microbial assemblage.  If I understand correctly this also works off correlations, via a neural network algorithm that identifies linkages between different taxa and environmental parameters (previously measured at the same time and place in the environment).  The next step is to predict the metabolic potential (genetic contents) of the microbial community.  The authors do this at the order level to sidestep some of the issues with genetic plasticity at finer taxonomic scales.  By taking all the available genomes for each predicted order, they can estimate its genetic contents.  This is pretty hand-wavy, as many functions are not shared between all members of an order, but it’s a good place to start.

The next step is where the rubber meets the road.  The authors attempt to connect metabolic potential to the community metabolome.  They do this by using the KEGG database to identify reaction products associated with the enzymes comprising the community metabolic potential.  To validate this approach the authors focus on the one metabolite for which there is abundant data; CO2.

I’m pretty excited about what they’ve done, it’s a great start to elucidating how microbial community structure interacts with the environment and vice-verse.  There are however, a number of caveats to this analysis.  First, as touched on earlier, there is a huge issue with genomic plasticity in both prokaryotic and eukaryotic marine microbes.  Genomes from very closely related taxa, even from the sames species, can differ by 40 % or more.  That is a lot of metabolic function that is present in one cell but not the other.  Another issue is phenotypic plasticity, which is the ability of a cell to run different pieces of “software” on its genetic “hardware”.  By expressing different combinations of genes for example, a cell can achieve a much higher level of phenotypic diversity than one might think by looking just at the contents of its genome.  As we are all well aware, software is easily broken or lost.  Thus even if genomic plasticity is at a minimum for a certain taxa, there is no guarantee that all its members well respond in a like manner to the same set of environmental conditions.  Even clonal cells are often observed to drift apart phenotypically long before their genomes diverge.  It will take a lot more phenotype-aware data collections (e.g. transcriptomics, proteomics, and metabolomics) to sort out the true impact of community structure on the environment.

Posted in Research | Leave a comment

New website address!

Pending the (hopeful) defense of my dissertation on August 8th I’ll be starting a postdoc at the Lamont-Doherty Earth Observatory at Columbia University.  Since there’s generally a little downtime between turning in a thesis and defending it I’ve migrated this site to a server at Columbia University.  The site at students.washington.edu/bowmanjs will no longer be maintained, but all posts and comments can be found at the new site.  I hope to see you there!

Please note that the Facebook page associated with this site (The Deming Ecosystem in Antarctica) will no longer be maintained.

Jeff

http://ldeo.columbia.edu/~bowmanjs

Posted in Uncategorized | 1 Comment

Some thoughts on modeling

I’m not a modeler, but I played one once in grad school. Or at least that’s how I’m feeling at the moment. I’m currently working on the last chapter of my dissertation and it became necessary to explore the mechanisms underlying some empirical observations. This encouraged (forced?) me to undertake my first foray into modeling, developing a simple model of how a protein might evolve given a set of conditions and using it to predict what conditions might be responsible for the observed state. Because of this deviation from my normal observational and experimental work I’ve been thinking a lot about what models actually are and what they’re useful for. Considering the degree to which everyone is exposed to models on a daily basis it’s troubling how little people, including most scientists, think about them.

Currently scientific modeling is so compartmentalized that “modeler” is considered an adequate description of someone’s research area, as in “oh, she’s a modeler”. That label is at the same time complimentary and derogatory. On the one hand it implies some mastery of computer programming, mathematics, statistics, and probably a healthy publication list. On the other hand it can imply a reductionist scientific philosophy (wherein interesting phenomena are reduced to simple, predictable “parameters”) and a tendency to see a better model as the overarching scientific goal. I once went to a talk where a climate model lectured the audience for 45 minutes on how to “be a good observational scientist” for the sake of model improvement. It was too easy to walk away from the talk with the sense that that modeler’s primary goal was not an improved understanding of the system but the development of a model that best reproduces the observed state (and thus might reasonably predict a future state). However that isn’t necessarily a bad thing, so long as the limitations are understood.

I see models as falling into two categories, which I’ll call mechanistic and predictive. A predictive model, such as a global climate model or a protein structural prediction model, isn’t about understanding the system. It’s about accurately predicting the endstate. Interesting dynamics occur in any natural process that don’t have an impact on the outcome. Representing all of them in a model is time consuming and computationally expensive. Consider a model designed to predict how well an automobile functions, where well is defined as driving in an efficient and safe fashion. Lots of phenomena contribute to this; the fuel delivery system, the various engine components, the exhaust system, the brakes, etc. It might be very interesting that the driver chooses to listen to the radio or run the AC while driving. Even though this has an impact on the other subsystems in the car, it doesn’t have a direct impact on the car running well. If we wanted to represent the operation of the vehicle as a model we would probably decline to include these phenomena as variables. To do so would require developing equations that predict their operation in relation to one another, and solving those equations repeatedly when the model is run with no improvement in our ability to predict the car’s performance.

A mechanistic model on the other hand can be thought of as an inventory of all the tiny pieces of a system that contribute to the system’s operation. Yes, we can represent the performance of the car in the previous model perfectly well without including the radio or air conditioner, but if we want to explore how these systems react with other subsystems they need to be included. For example, suppose a driver likes the interior of the car really, really cold and hopes to install an industrial grade air conditioner that exceeds the wattage of the alternator. This would certainly impact the performance of the vehicle but, since we opted not to include it in the predictive model of automobile performance, we’d never know it from running that model

The hard reality is that no one will ever raise enough money to hire enough researchers and purchase enough computer time to run mechanistic models of the Earth system, or even a local ecosystem. As valuable as that would be it’s just too difficult to do (without lots of time, people, and money). Thus developing useful predictive models depends on very careful selection of the parameters that will be included.

In predictive oceanographic models an oft-cited example is the representation of phytoplankton. These single-celled primary producers are responsible for a huge chunk of global carbon fixation and carbon sequestration. Often a single “average” of nutrient requirements and carbon uptake and export rates is used for phytoplankton in climate and ecosystem models, despite the fact that there are thousands of different phytoplankton species, often with radically different nutrient requirements and uptake/export rates. Representing phytoplankton as a single variable is efficient, but can produce an erroneous result in a predictive model. Furthermore such a predictive model is useless for experiments that explore ecosystem impacts of changing phytoplankton populations. However it isn’t practical to represent thousands of phytoplankton species as separate variables. Not only would it be computationally inefficient, we don’t know enough about most of those species to estimate their nutrient requirements of uptake/export rates. So where do we draw the line? What about prokaryotes, which are even more diverse than phytoplankton? Viruses? Heterotrophic protists?

Right now I’m on my way to a workshop at the Bigelow Laboratory for Ocean Science at Boothbay, Maine, for three days of exploration of this issue. A major question of the attendees is how much observing is required to get some of these parameters approximately right? How many phytoplankton does one need to sequence for example, before we have a reasonable understanding of the genetic diversity (and thus metabolic potential) of this group? The workshop, sponsored by the Ocean Carbon and Biogeochemistry program of the US Carbon Cycle Science Program, was organized by the Woods Hole Oceanographic Institute and the organizers did an excellent job of including students and postdocs in addition to senior researchers. Thanks for the opportunity to join the conversation and I’m looking forward to digging into these issues!

Posted in Uncategorized | Leave a comment

Making maps in R

I don’t have a lot of experience with geospatial data analysis but following a recent cruise I had the need to plot some geospatial data.  The go-to program for this seems to be Matlab (or even ArcGIS, for those who are serious about their map making), but I do almost all of my plotting in R.  This sent me on a bit of an exploration of the various map making options for R.  Although the maturity of the various R geospatial libraries is far below that of commercial products, it’s possible to make some pretty decent plots.  Here’s a synopsis of the method I ended up using.

To plot data on a map you first need a map to work from.  A map comes in the form of a shape file, which is nothing more than an collection of lines or shapes describing the outline of various map features.  Shape files come in various flavors, including lines and polygons.  For reasons that will become clear working with polygons is much preferable to working with lines, but I found it difficult (impossible) to consistently get the right polygon format into R.  R does have some built in polygons in the maps and mapdata packages.  Using these it is possible to access a global polygon map, however the resolution of the map is quite low.  The documentation doesn’t provide a maximum resolution, but it is low enough to be of little value on scales below hundreds of miles.  Here’s a quick plot of our study area (the Antarctic Peninsula) with no further information plotted, just to get a sense of the resolution.

library(maps)
library(mapdata)
map(database = 'world',
    regions = "antarctica",
    xlim = c(-77, -55),
    ylim = c(-70, -60),
    fill = T,
    col = 'grey',
    resolution = 0,
    bg = 'white',
    mar = c(1,1,2,1)
)

box()

basic_plot

 Not too bad, but the y dimension is over 1000 km.  If our map covered 10 km or even 100 km we’d have at best a large block to define any landmass.  To produce a more detailed map we need to use a shape file with a much higher resolution.  This is where things get a little tricky.  There are high resolution shape files available from a number of sources, primarily government agencies, but finding them can be difficult.  After a lot of searching I discovered the GSHHG High-resolution Geography Database available from NOAA.  The database is quite large, but it can be subset with the GEODAS Coastline Extractor which also makes use of the WDBII database (included with the GSHHG link) containing rivers and political boundaries.  It takes a little fussing around, but once you get the hang of the coastline extractor it isn’t too bad to work with.  On first use you have to point the extractor to the databases.  Then select “plot” from the drop down menu, and enter the desired lat and long range.  This returns a nice graphic of the selected region which you can export in the desired shape file format using “Save as”.

The export step is where I’m a bit stuck.  Ideally you would export the coastline as a polygon so that R could color it, if desired (as in the previous example).  I can only get R to recognize shape files containing lines however, exported in the ESRI format.  Here’s the region plotted earlier but now from the GSHHG database:

library(maps)
library(mapdata)
library(maptools)
plter_shape <- readShapeLines('plter_region_shore_shore',
                              repair = T)
map("world",
    "antarctica",
    xlim = xlim,
    ylim = ylim,
    fill = T,
    col = 'grey',
    resolution = 0,
    bg = 'white',
    mar = c(6,3,2,1),
    type = 'n'
)

plot(plter_shape, add = T)
box()

Which produces this plot:

Rplot02

That’s quite a bit more detail.  If we zoomed in the improvement would be even more pronounced.  You’ll notice that I plotted the ESRI shape file on a plot produced from maps, even though I had no intention of using maps to reproduce the coastline.  Using maps allows R to plot the data as a projection, which is particularly important at high latitudes.  The default maps projection isn’t great for 60 degrees south (see the maps documentation for a list of available projections), but it gets the point across.  A square projection would be much worse.  What’s particularly nice about all this is that R will use the projected coordinates for all future calls to this plot.  Points or lines added to the plot will thus show up in the right place.

Now that we have our basic map we can add some data.  One of our science tasks was to evaluate bacterial production along the Palmer Long Term Ecological Research Station grid.  Plotting depth-integrated values on our map lets us see where bacterial production was high.  I’m not going to bother showing how I arrived at depth integrated values because the code is too specific to my data format, but suffice to say that I used the interp1 function to interpolate between depth points at each station, then I summed the interpolated values.  This gives me a data frame (depth_int) of three variables: lon, lat, and bp (bacterial production).

First, let’s generate some colors to use for plotting:

z <- depth_int$bp
zcol <- colorRampPalette(c('white', 'blue', 'red'))(100)[as.numeric(cut(z, breaks = 100))]

Then we add the points to the plot.  This is no different than building any other R plot:

points(depth_int$lon,
       depth_int$lat,
       bg = zcol,
       pch = 21,
       cex = 2,
       col = 'grey'
)

Snazz it up a little by adding grid lines and axis labels:

axis(1, labels = T)
axis(2, labels = T)
grid()

Titles…

title <- expression(paste('Bacterial production gC m'^{-2},'d'^{-1}, sep = " "))
title(xlab = 'Longitude',
      ylab = 'Latitude',
      main = 'title')

And the coup de grâce, a scale bar.  I can’t find a good scale bar function, but it’s easy enough to build one from scratch:

## set the location and the colorbar gradation
xleft <- -74
xright <- -73
ybot <- -65
yint <- (65 - 62.5) / 100
ytop <- ybot + yint

## create the bar by stacking a bunch of colored rectangles
for(c in colorRampPalette(c('white', 'blue', 'red'))(100)){
  ybot = ybot + yint
  ytop = ytop + yint
  rect(xleft, ybot, xright, ytop, border = NA, col = c)
  print(c(xleft, xright, ybot, ytop, c))
}

## generate labels
labels <- round(seq(min(z), max(z), length.out = 5),2)

## add the labels to the plot
text(c(xright + 0.2),
     seq(-65, -62.5, length.out = 5),
     labels = as.character(labels),
     cex = 0.8,
     pos = 4)

Done correctly this should approximate the plot published in my earlier post.  Plenty of tweaks one could make (I’d start by shifting the location of the scale bar and drawing an outlined white box around it), but it’s not a bad start!

Rplot03

Posted in Research | 3 Comments