Rough Start but Smooth Ice

We’re off to a rough start this season!  Two of our instruments are down, including our flow cytometer – annoying, but we can deal with it – and Colleen’s instrument for measuring superoxide.  That’s a real problem.  Colleen is only with us for five more days.  When she leaves the instrument stays, but we will no longer have a skilled operator!  Measuring superoxide is not trivial and I was supposed to spend a good chunk of this week learning how to do it.  That’s going to be tricky with no instrument.  Fortunately the instrument tech at Palmer this season is handy with a soldering iron and seems to have some ideas.  We’ll see how that plays out tomorrow.

The one piece of good news this week is that the big storm last Sunday didn’t do much damage to the land-fast sea ice near Palmer Station.  At least for now we can do a little science on the ice.  This afternoon Jamie Collins, Nicole Couto, and I went out with the SAR team to establish a sea ice sample site near the station.  Hopefully we can get a couple weeks of sampling at this site before the sea ice deteriorates.

Jamie measures ice thickness. Right about 70 cm in this case; nice thick ice that will hopefully stick around for a while.

Jamie measures ice thickness. Right about 70 cm in this case; nice thick ice that will hopefully stick around for a while.

Being able to do some science on the sea ice at Palmer Station is actually a pretty big deal and an unexpected bonus for this season.  In some ways this is a very logical place to study ice.  Palmer Station is the United States’ premier polar marine research station, and you can find dozens of papers describing the ecological importance of sea ice in this region.  It’s been years however, since anyone was able to routinely access sea ice from the station.  Considering the amount of ecological research that takes place here this actually seems a little silly; the single most important feature is virtually ignored for practical reasons.  Working on ephemeral, dynamic sea ice requires a set of skills, equipment, and intrepidness that simply doesn’t exist in this day and age within the US Antarctic Program.

The bottom piece of an ice core collected today. It's early in the season and there isn't much happening yet. If you squint though, you can see the faintest green in the ice, a hint of the algal bloom to come.

The bottom piece of an ice core collected today. It’s early in the season and there isn’t much happening yet. If you squint though you can see the faintest green in the ice, a hint of the algal bloom to come.

Our very small adventure today (on relatively thick, static ice) is reason to hope that that might eventually change.  There isn’t a lot of institutional knowledge about sea ice at Palmer Station, but Station staff and management are open minded and seem eager to learn.  As a further indication the Cold Regions Research and Engineering Lab recently provided new recommendations for sea ice operations at McMurdo Station, a major step toward a rational, data-based policy for traveling and working on ice (which I’ll link it I can find, too tired to search now… must fix flow cytometer…).

Hopefully we can get some good science done on the sea ice this season.  In the Arctic large, under ice phytoplankton blooms are a major source of new carbon to the ecosystem.  In the Antarctic blooms of algae at the ice-water interface are an essential food source for juvenile krill – adult krill being the major food source for virtually everything else down here.  Getting some indication of when, where, and how often these events occur along the West Antarctic Peninsula will tell us a lot about how these ecosystems function, and what will happen to them as the ice season and range continues to decline.

In case you ever have to track a penguin, this is what penguin tracks look like.

And in case you ever have to track a penguin, this is what penguin tracks look like.

Posted in Palmer 2015 field season | Leave a comment

Punta Arenas to Palmer Station

We arrived at Palmer Station last Thursday morning after a particularly long trip down from Punta Arenas. Depending on the weather the trip across the Drake Passage and down the Peninsula to Anvers Island typically takes about four days. This time however, the Laurence M. Gould had science to do and a NOAA field camp to put in at Cape Shirreff on Livingston Island. This was a particularly welcome event as it gave us an opportunity to get off the boat and get a little exercise unloading 5 months of supplies for the NOAA science team.

Since arriving at Palmer Station the activity has been nonstop. In addition to lab orientations and water safety training there is the seemingly never-ending job of setting up our lab and getting instruments up and running. Yesterday evening following the weekly station meeting we did manage to go for a short ski on the glacier out behind the station. I’m glad we did because today the weather took a real turn for the worse; winds are gusting to 55 knots and strengthening. This is a real concern for us because wind strength and direction are the primary determinant of the presence and condition of sea ice in this area. As I wrote in my previous post we are hoping for sea ice to be either very solid, so we can sample from it or clear out completely, so we can get the zodiacs in the water. We’ll have to wait until the storm passes to see what conditions are like but very likely it will be neither!

DSC_0453

A derelict steel-hulled sailing vessel beached outside of Punta Arenas (taken in 2013 on my last trip to Antarctica). Before the Panama Canal opened Punta Arenas, located on the Strait of Magellan, was an important stopping point for ships sailing between the Atlantic and Pacific. Today the city is best known as the jumping off point for cruise ships (and research vessels) heading to Antarctica, and as an access point to Chile’s Torres del Paine.

DSC_0040

A moderate swell breaking on the side of the Gould as we leave Tierra del Fuego behind. Overall it was an extremely mild crossing of the Drake Passage, which didn’t prevent me from getting sick (as per SOP).

DSC_0076

As we crossed the Antarctic Polar Front the weather got noticeably colder. Here, sea spray freezes on one of the Gould’s spotlights.

DSC_0132

A welcome diversion was the NOAA field camp put-in at Cape Shireff. This included such antics as raising (sans crane) a four-wheeler from a bobbing zodiac onto a six-foot high snow berm. Don’t ask me how it was done; I was there and I’m still not sure. In this photo you can see the Laurence M. Gould in the background and a zodiac bringing in another load of supplies.

DSC_0184

The Gould picks its way towards the pack ice. I’ve only been on two ships in sea ice and the experiences couldn’t have been more different. Back in 2009 I sailed in the Arctic onboard Oden, a powerful Swedish icebreaker. We smashed ice a meter thick and more day and “night” (it was summer) for six weeks straight. The Gould is a different sort of animal. It isn’t a true icebreaker and, if winds and currents conspired against it, could become trapped in rafting ice. Moving the Gould into even thin ice is a a delicate process.

DSC_0357

Science in action! There were two science parties conducting research on the way down. One is studying the distribution of krill in the Drake Passage. The other is studying the response of deep water corals to ocean warming. Here graduate student Caitlin Cleaver from the University of Maine washes corals freshly collected from 700 meters deep. The corals were transported live to Palmer Station for further experiments.

DSC_0371

Yesterday morning the Laurence M. Gould raised its gangplank and departed Palmer Station.

DSC_0401

Jamie and Colleen take a break from lab setup for a hike on the glacier behind Palmer Station (seen in the background).

DSC_0418

Today started calm with grey skies, conditions deteriorated with astonishing speed after lunch. Within just a few minutes winds went from a study 20 knots to gusting to 55 knots (63 mph).

Posted in Palmer 2015 field season | Leave a comment

Enroute to Palmer Station

I’m currently sitting in the Dallas airport waiting for a flight to Santiago, Chile, enroute to Palmer Station for the 2015 spring season. Since there is no airfield at Palmer we’ll go in and out by boat (the ARSV Laurence M. Gould). Hopefully we’ll be at the station by October 28 and able to start doing some science not too long after that. There are a couple of reasons why I’m excited about the upcoming season. First, as I discuss in this post, conditions are highly unusual this year, with the extent of sea ice reaching a level not seen at Palmer Station for many years. The reason for this seems to be the persistent warm El Niño conditions in the tropical Pacific Ocean, now complemented by a near zero to negative Southern Annual Mode (negative SAM values are correlated to high sea ice conditions). This increase in sea ice is a counter intuitive but very real effect of global climate change; increased heat in one area of the globe alters global wind patterns and decreases the flow of heat to other areas of the globe. It hasn’t actually been very cold at Palmer Station (the high today was a balmy 24 °F at the time of writing) and how long the sea ice lasts will be depend very much on what happens to winds in the region.

Coming in an era defined by decreasing sea ice along the West Antarctic Peninsula the presence of heavy ice cover could have some interesting ecological impacts. There is a strong likelihood that it will be good for the Adélie penguins, but my primary interest is a little lower down in the food web. I’ll be studying interactions between phytoplankton, the basal food source for the WAP ecosystem, and bacteria at the onset of the spring bloom, hoping to identify cooperative interactions through patterns in bacterial gene expression. Toxic compounds produced by phytoplankton, for example, may be cleaned up by bacterial partners, allowing photosynthesis to proceed more efficiently (ultimately meaning more food for the whole food web). Observing the expression of genes coding for the bacterial enzymes that carry out these processes would be strong evidence for this kind of synergy, which leads me to the second reason I’m excited about the upcoming season.

Electron configuration of superoxide. The extra electron is one more than oxygen an handle, and makes the molecule highly reactive.

Electron configuration of superoxide. The extra electron is one more than oxygen can handle, and makes the molecule highly reactive. Image from https://commons.wikimedia.org/wiki/File:Superoxide.png.

This year I’m joined by Colleen Hansel and Jamie Collins from the Woods Hole Oceanographic Institute. Colleen and Jamie are chemical oceanographers and experts in identifying specific compounds produced by phytoplankton. Colleen has pioneered a technique to measure superoxide, a damaging free radical, directly in the water column. This is not a trivial undertaking as the half-life of superoxide is only seconds, making traditional oceanographic sampling techniques (such as a Niskin bottle) impossible to employ. Instead we will focus on sampling water in the first few meters of the water column, just above the maximum zone of primary production. Superoxide is produced during photosynthesis, when energetic electrons glob onto free oxygen. The extra electron makes oxygen highly reactive (hence superoxide; it’s a superoxidant) and physiologically damaging. Bacteria have some interesting molecular tools to deal with superoxide however, so perhaps they’ve evolved the ability to perform this service for phytoplankton in exchange for fixed carbon. Coupling observations of gene expression with measures of superoxide and other reactive chemical species is much more powerful, and will tell a much more complete story, than either does alone.

It’s impossible to anticipate how the ice will impact our science plan until we’re at the station and get a feel for how logistics will work this season. Typically sampling at Palmer Station is done by zodiac, which requires reasonably ice-free conditions. The zodiacs can push around a small amount of brash ice but lack the mass (and shrouded propeller) to deal with large quantities. The ice is solid enough this year that we may be allowed to use this ice as a sampling platform – something I’ve got plenty of experience with from previous trips to the Arctic and Antarctic. This is a little out of the norm for Palmer Station however, so we’ll have to see how negotiations proceed.

In our worst-case scenario the ice conditions deteriorate to the point that we can’t sample from it, but not so much that we can push a zodiac through it. The normal sampling procedure in this case is to use a plumbed seawater intake to sample from below the ice (with the added benefit that you can sample from the comfort of the lab), however, this won’t work given the short half-life of superoxide. In this eventuality I think we can salvage the project by focusing on ice algae in place of phytoplankton. Ice algae are essentially phytoplankton which have given up their free-living lifestyle and formed colonies on the underside of the sea ice. These dense mats are a very important food source for juvenile krill, but are understudied in the region given the inconsistent nature of sea ice along the WAP. If we can access some decent ice floes from shore I think we can make a good study of the superoxide gradient, and bacterial response, toward the ice algal colonies. Previous work has shown that ice algae can be under significant oxidative stress so they may have good reason to solicit a little help from bacteria.

Posted in Palmer 2015 field season | Leave a comment

paprica v0.20

A couple of months ago I published paprica v0.11, a set of scripts for conducting a metabolic inference from a collection of 16S rRNA gene reads.  This approach allows you to estimate the functional capabilities of a microbial community if you don’t have access to a metagenome or metatranscriptome.  Paprica started as a method for a paper I was writing but eventually became complex enough to warrant it’s own publication.  Paprica v0.11 reflected this origin – it produced nice results but was cludgy and cumbersome.

Over the last couple of weeks I’ve given paprica a complete overhaul and am happy to introduce v0.20.  There are a number of major differences between v0.11 and v0.20, but the most significant difference is a more clear division between construction of the database for those who want full control (and access to the PGDBs) and sample analysis, which can proceed with only the provided, light-weight database (however you will not have access to the PGDBs).  Executing paprica v0.20 is as easy as (from your home directory, for the provided file test.fasta):

git clone https://github.com/bowmanjeffs/genome_finder.git
cd genome_finder
chmod a+x paprica_run.sh
./paprica_run.sh test

One really important distinction between this version and v0.11 is that metabolic pathways are NOT predicted directly on internal nodes.  This was done for reasons of organization and efficiency, but I’m not sure that it made much sense to do this anyway.  Instead the pathways likely to be found for an internal node are inferred from their appearance in terminal daughter nodes (that is, the completed genomes that belong to the clade defined by the internal node).  If a given pathway is present in some specified fraction (0.90 by default) of the terminal daughters it is included in the internal node.  You can change this value by modifying the appropriate variable in pathway_profile.txt.  Some (including myself) might like to have a PGDB for an internal node for purposes of visualization or modeling.  In the near future I’ll release a utility to create a PGDB for an internal node on demand.

Some other major improvements…

  • Fewer dependencies.  For the scripts called in paprica_run.sh you need pplacer, seqmagick, infernal, and some Python modules that you should probably have anyway.
  • Improved reference tree.  I’m still working on this, but the current method uses RAxML for phylogenetic inference and Infernal for aligment, which seems to work much better than the previous (albeit much faster) combo of Fasttree and Mothur.  Thanks to Eric Matsen for helpful suggestions in this regard.
  • More genome parameters.  I have a particular interest in how genome parameters (e.g. length, coding density, etc.) are distributed in the environment.  Paprica gives you a whole list of interesting metrics for the terminal and internal nodes.

Paprica is still in heavy development and I have a lot of improvements planned for future versions.  If you try v0.20 I’d love to know what you think – good, bad, or otherwise!  You can create an issue on Github or email me.

Posted in paprica | Leave a comment

SCAR session on microbial ecology

Along with colleagues from New Zealand, Argentina, and Malaysia I’m convening a session on microbial ecology and evolution at the upcoming biennial SCAR meeting in Kuala Lumpur (because there’s no better place to talk about ice than the tropics).  If this sounds like your sort of thing check it out!

S23. Microbes, diversity, and ecological roles

Walter MacCormack, Argentina; Charles Lee, New Zealand; Chun Wie Chong, Malaysia; Jeff Bowman, USA

The ecology of Antarctica is largely shaped by microbes, with microbial life, including prokaryotes and unicellular eukaryotes, serving as the main drivers of ecosystem function.  Given this, it is perhaps surprisingly that our current understanding of Antarctic biota has been derived primarily from studies of metazoans. Despite major advances in the field of Antarctic microbiology in recent years there remains a knowledge gap in our understanding of the distribution, functions, and adaptations of Antarctic microbes. There is a general consensus that Antarctic microorganisms are highly diverse, and in many cases encompass endemic gene pools with unique physiological and genetic adaptations to the extreme conditions of their environment. Relatively recently, the advent of ‘omics platforms has allowed researchers to observe these processes in great detail. This session welcomes submissions on all aspects of microbial ecology and evolution in Antarctica and the Southern Ocean. This includes ‘omics-based approaches to understanding prokaryotic and unicellular eukaryotic diversity, function, adaptation, as well as laboratory and field-based studies of microbial and ecological physiology. Special consideration will be given for abstracts addressing the following issues: (1) Microbial biogeography, functional redundancy, and ecosystem services; (2) Trophic connectivity between prokaryotes and eukaryotes; (3) Cold adaptation strategy and evolution; and (4) Multiple ‘omics integration addressing systems biology of Antarctic ecosystems.

Posted in Uncategorized | Leave a comment

Sea ice bacteria review published

I’m really excited (and relieved) to report that my review on the taxonomy and function of sea ice microbial communities was recently published in the journal Elementa.  The review is part of a series on biological exchange processes at the sea ice interface, by the SCOR working group of the same name (BEPSII).  I’m deeply appreciative of Nadja Steiner, Lisa Miller*, Jaqueline Stefels, and the other senior members of BEPSII for letting (very) junior scientists take such an active role in the working group.  I conceived the review in a foggy haze last year while writing my dissertation, when I assumed that there would be “plenty of time” for that kind of project before starting my postdoc.  Considering that I didn’t even start aggregating the necessary data until I got to Lamont I’m also deeply appreciative of my postdoctoral advisor for supporting this effort…

The review is really half review, half meta-analysis of existing sea ice data.  The first bit, which draws heavily on the introduction to my dissertation, describes some of the history of sea ice microbial ecology (which goes back to at least 1918 for prokaryotes).  From there the review moves into an analysis of the taxonomic composition of the sea ice microbial community, based on existing 16S rRNA gene sequence data, takes a look at patterns of bacterial and primary production in sea ice, and then uses PAPRICA to infer metabolic function for the observed microbial taxa (after 97 years we still don’t have any metagenomes for sea ice – let alone metatranscriptomes – and precious few isolates).

There is a lot of info in this paper but I hope a few big points make it across.  First, we have a massive geographical bias in our sea ice samples.  This is to be expected, but I don’t think we should just accept it as what has to be.  More disconcerting, there has been very little effort to integrate physiological measures in sea ice (such as bacterial production) with analyses of microbial community structure.  A major exception is the work of the Kaartokallio group at the Finnish Environmental Group, but their work has primarily taken place in the Baltic Sea (an excellent system, but very different from the high Arctic and coastal Antarctic).  This all translates into work that needs to be done however, which is a good thing… we are just barely at the point where we can make reasonable hypothesis regarding the functions of these communities.

Taken from Bowman, 2015. Sampling locations for sea ice studies that have collected community structure data (blue), ecological physiology data (red), and both (orange). Note the strong sampling bias, particularly in the Antarctic. The black arrows point to the locations of the two community structure studies (at the time of writing) that we sufficiently deep to actually describe community structure.

Taken from Bowman, 2015. Sampling locations for sea ice studies that have collected community structure data (blue), ecological physiology data (red), and both (green). Note the strong sampling bias, particularly in the Antarctic. The black arrows point to the locations of the two community structure studies (at the time of writing) that we sufficiently deep to actually describe community structure.

*This image of Lisa pops up a lot. If you can identify what, exactly, is going on in this picture I’ll buy you a beer.

Posted in Research | Leave a comment

Microbial ecology of the cryosphere

A quick post on an excellent review published last week by Antje Boetius and co-authors (including Jody Deming, my PhD advisor) in Nature Reviews Microbiology, titled Microbial ecology of the cryosphere: sea ice and glacial habitats.  The review, focused on viral, bacterial, and archael microbes, provides an excellent overview of the major habitats within the cryosphere (broadly glacial ice, sea ice, and snow), the challenges and opportunities for microbial life, and the observed distribution of taxa and genes (to the extent that we know it).  Like most Nature Reviews it is written for a broad audience and assumes no deep knowledge of microbial ecology or the cryosphere.

Taken from Boetius et al., 2015.

Taken from Boetius et al., 2015.  Top: a schematic of different elements of the cryosphere, b: warm, summertime sea ice, c: the supraglacial environment, featuring a meltriver, d: cold winter sea ice, e: the subglacial environment, featuring the Blood Falls outflow from Taylor Glacier.

Plenty of reviews have been written on microbial life at low temperature, what makes this one stand out to me is the ecological focus.  Although discussions of biogeography (i.e. what taxa are where) and metabolism are woven throughout the review, the emphasis is on habitats, including newly recognized habitats like frost flowers and saline snow.  Check it out!

Posted in paprica | Leave a comment

And now…

…for something completely different.  My wife and I are expecting our first child in a few months, which is wonderful and all, but means that we are faced with the daunting task of coming up with a name.  Being data analysis types (she much more than me), and subscribing to the philosophy that there is no problem that Python can’t solve, we decided to write competing scripts to select a good subset of names.  This is my first crack at a script (which I’ve titled BAMBI for BAby naMe BIas), I’ve also posted the code to Github.  That will stay up to date as I refine my method (in case you too would like Python to name your child).

My general approach was to take the list of baby names used in 2014 and published by the Social Security Agency here, bias against the very rare and very common names (personal preference), then somehow use a combination of our birth dates and a random number generator to create a list of names for further consideration.   Okay, let’s give it a go…

First, define some variables. Their use will be apparent later.  Obviously replace 999999 with the real values.

get = 100 # how many names do you want returned?
wife_bday = 999999
my_bday = 999999
due_date = 999999
aatc = 999999 # address at time of conception
size = (wife_bday + my_bday) / (due_date / aatc)
start_letters = ['V','M'] # restrict names to those that start with these letters, can leave as empty list if no restriction desired
sex = 'F' # F or M

Then import the necessary modules.

import matplotlib
import numpy as np
import matplotlib.pyplot as py
import math
import scipy.stats as sps

Define a couple of variables to hold the names and abundance data, then read the file from the SSA.

p = [] # this will hold abundance
names = [] # this will hold the names
            
with open('yob2014.txt', 'r') as names_in:
    for line in names_in:
        line = line.rstrip()
        line = line.split(',')
        if line[1] == sex:
            if len(start_letters) > 0:
                if line[0][0] in start_letters:
                    n = float(line[2])
                    p.append(float(n))       
                    names.append(line[0])
            else:
                n = float(line[2])
                p.append(float(n))       
                names.append(line[0])

Excellent. Now the key feature of my method is that it biases against both very rare and very common names. To take a look at the abundance distribution run:

py.hist(p, bins = 100)

figure_1Ignore the ugly X-axis.  Baby name abundance follows a logarithmic distribution; a few names are given to a large number of babies, with a long “tail” of rare baby names.  In 2014 Emma led the pack with 20,799 new Emmas welcomed into the world.  My approach – I have no idea if it’s at all valid, so use on your own baby with caution – was to fit a normal distribution to the sorted list of names.  I got the parameters for the distribution from the geometric mean and standard deviation (as the arithmetic mean and SD have no meaning for a log distribution).  The geometric mean can be calculated with the gmean function, I could not find a ready-made function for the geometric standard deviation:

geo_mean = sps.mstats.gmean(p)
print 'mean name abundance is', geo_mean

def calc_geo_sd(geo_mean, p):
    p2 = []

    for i in p:
        p2.append(math.log(i / geo_mean) ** 2)
    
    sum_p2 = sum(p2)
    geo_sd = math.exp(math.sqrt(sum_p2 / len(p)))
    return(geo_sd)
    
geo_sd = calc_geo_sd(geo_mean, p)
print 'the standard deviation of name abundance is', geo_sd

## get a gaussian distribution of mean = geo_mean and sd = geo_sd
## of length len(p)

dist_param = sps.norm(loc = geo_mean, scale = geo_sd)
dist = dist_param.rvs(size = sum(p))

## now get the probability of these values

print 'wait for it, generating name probabilities...'
temp_hist = py.hist(dist, bins = len(p))
probs = temp_hist[0]
probs = probs / sum(probs) # potentially max(probs)

At this point we have a list of probabilities the same length as our list of names and preferencing names of middle abundance. The next and final step is to generate two pools of possible names. The first pool is derived from a biased-random selection that takes into account the probabilities, birth dates, due date, and address at time of conception. The second, truly random pool is a subset of the first with the desired size (here 100 names).

possible_names = np.random.choice(names, size = size, p = probs, replace = True)
final_names = np.random.choice(possible_names, size = get, replace = False)

And finally, print your list of names! I recommend roulette or darts to narrow this list further.

with open('pick_your_kids_name.txt', 'w') as output:
    for name in final_names:
        print name
        print >> output, name
Posted in Uncategorized | 3 Comments

Introducing PAPRICA

I’m very excited to report that our latest paper – Microbial communities can be described by metabolic structure: A general framework and application to a seasonally variable, depth-stratified microbial community from the coastal West Antarctic Peninsula was just published in the journal PLoS one.  The paper builds on two very distinct bodies of work; a growing literature on microbial community structure and function along the climatically sensitive West Antarctic Peninsula, and a family of new techniques to predict community metabolic function from 16S rRNA gene libraries, which we are calling metabolic inference.

The motivation for metabolic inference is in the large amount of time that it takes to manually curate a likely set of functions for even a small collection of 16S rRNA genes.  In today’s world, where most analyses of microbial community structure consist of many thousand of reads representing hundreds of taxa, it is simply impossible to dig through the literature on each strain to see what metabolic role each is likely to be playing.  Ideally a researcher would use metagenomics or metatranscriptomics to get at this information directly, but it is not advisable or desirable in most cases to sequence hundreds of metagenomes or metatranscriptomes (necessary for the kind of temporal or spatial resolution many of us want these days).  Metabolic inference provides a convenient alternative.

A quick Google Scholar survey of the number of studies since 2005 that have used high throughput 16S rRNA gene sequencing.

A quick Google Scholar survey of the number of studies since 2005 that have used high throughput 16S rRNA gene sequencing.  Over the last ten years we’ve collected an astonishing amount of sequence data from a diverse array of environments, however, much of this data has been from taxonomic marker genes like the 16S rRNA gene, leaving microbial community function largely unknown.  PAPRICA and other methods that try to infer microbial functional potential from 16S rRNA gene data can help bridge this gap.

The basic concept behind all metabolic inference techniques (e.g. PICRUSt, tax4fun, PAPRICA) is hidden state prediction (HSP) (you can find a nice paper on HSP here).  In 16S rRNA gene analysis metabolic potential is a hidden state.  The metabolic inference techniques propose different ways to predict this hidden state based on the information available.

Our small contribution to this effort was to develop a method (PAPRICA – PAthway PRediction by phylogenetIC plAcement) that uses phylogenetic placement to conduct the metabolic inference instead of an OTU (operational taxonomic unit) based approach.  Our approach provides a more intuitive connection between the 16S rRNA analysis and the HSP (or at least it does in my mind) and can increase the accuracy of the inference for taxa that have a lot of sequenced genomes.

Most analysis of large 16S rRNA datasets rely on an OTU based approach.  In a typical OTU analysis an investigator aligns 16S rRNA reads, constructs a distance matrix of the alignments, and clusters the reads at some predetermined distance.  By tradition the default distance has become a dissimilarity of 0.03.  This approach has some advantages.  By clustering reads into discrete units it is easy to quantify the presence or absence of different OTUs, and it allows microbial ecologists to avoid problems with defining prokaryotic species (which defy most of the criteria used to define species in more complex organisms).  To conduct a metabolic inference on an OTU based analyses it is possible to simply reconstruct the likely metabolism for a predefined set of OTUs based on the OTU assignments of published genomes.  This works great, but it limits the resolution of the inference to the selected OTU definition (i.e. 0.03).  For some taxa, such as Escherichia coli (and plenty of more interesting environmental bugs), there are many sequenced genomes that have very similar 16S rRNA gene sequences.  PAPRICA provides a way to improve the resolution of the metabolic inference for these taxa.

Our approach was to build a phylogenetic tree of the 16S rRNA genes from each completed genome.  For each internal node on the reference tree we determine a “consensus genome”, defined as all genomes shared by all members of the clade originating from the node, and predict the metabolic pathways present in the consensus and complete genomes using Pathway-Tools.  To conduct the actual analysis we use pplacer to place our query reads on the reference tree and assign the metabolic pathways for each point of placement to the query reads.  One advantage to this approach is that the resolution changes depending on genomes sequence coverage of the reference tree.  For families, genera, and even species for which lots of genomes have been sequenced resolution is high.  For regions of the tree where there are not many sequenced genomes resolution is poor, however, the method will give you the best of what’s available.

Fig_2

Figure from Bowman and Ducklow, 2015.  PAPRICA includes a confidence scoring metric that takes into account the relative plasticity of different genomes.  In this figure each vertical line is a genome (representing a numbered terminal node on our reference tree), with the height and color of the vertical line giving its relative plasticity (which we refer to as the parameter phi).  The genomes identified with Roman numerals are all known to be exceptionally modified, which is a nice validation of the phi parameter.  Many of these are obligate symbionts.  I) Nanoarcheum equitans II) the Mycobacteria III) a butyrate producing bacterium within the Clostridium IV) Candidatus Hodgkinia circadicola V) the Mycoplasma VI) Sulcia muelleri VII) Portiera aleyrodidanum VIII) Buchnera aphidicola, IX) the Oxalobacteraceae.

PAPRICA provides some additional helpful pieces of information.  We built in a confidence scoring metric that takes into account both predicted genomic plasticity and the size of the consensus genome relative to the mean size for the clade (deeper branching clades will have a bigger difference), and predicts the size of the genome and number of 16S rRNA gene copies associated with each 16S rRNA gene, both of which have a strong connection to the ecological role of a bacterium

For our initial application of PAPRICA we selected a previously published 16S rRNA gene sequence dataset from the West Antarctic Peninsula (our primary region of interest).  One thing that we were very interested in looking at was whether we could describe differences between microbial communities organized along ecological gradients (e.g. inshore vs. offshore, or surface vs. deep water) in terms of metabolic structure in place of the more traditional 16S rRNA gene (i.e. taxonomic) structure.  Using PAPRICA to convert the 16S rRNA gene sequences into collections of metabolic pathways we found that we could reconstruct the same inter-sample relationships identified by an analysis of taxonomic structure.  This means that a microbial ecologist can, if they choose, disregard the messy and sometimes uninformative taxonomic structure data and go directly to metabolic structure without losing information.  Applying common multivariate statistical approaches (PCA, MDS, etc.) to metabolic structure data yields information like which pathways are driving the variance between sites, and which are correlated with what environmental parameters.  This information is much more relevant to most research questions than the distribution of different microbial taxa.  It is worth noting that while inter-sample relationships are well preserved in metabolic structure, the absolute distance between samples is much less than for taxonomic structure.  This might have some implications for the functional resilience of microbial communities, which we get into a little bit in the paper.

PAPRICA was an outgrowth of a couple of other papers that I’m working on.  At some point the bioinformatic methods reached a point where separate publication was justified.  As a result, and reflecting the fact that I’m much more an ecologist than a computational biologist, PAPRICA is not nearly as streamlined as PICRUSt (which is even available through an online interface).  I’ve spent quite a bit of time, however, trying to make the scripts user friendly and transportable.  Anyone should be able to get them to work without too much difficulty.  If you decide to give PAPRICA a try and run into an hitches please let me know, either by posting an issue in Github or emailing me directly!  Suggestions for improvement are also welcome.

Posted in paprica | 9 Comments

El Nino, SAM, and sea ice conditions at Palmer

In 2 and a half months, and unless there’s another government shutdown, I’m heading down to Palmer Station to collect a key set of samples for one of my projects.  The idea is to time this sampling effort with the spring diatom bloom.  This bloom is a critical pulse of fixed carbon into the ecosystem after the dark (sub)polar winter, and is followed by a series of blooms by lesser phytoplankton players – cryptophytes, dinoflagellates, and Phaeocystis.  The problem with this plan is that it is pretty much impossible to guess when the spring bloom will happen.  Ecologically speaking it happens as soon as the ice retreats (or rather, as the ice is retreating), but this can vary by many weeks from one season to the next.  The problem is compounded by the sampling strategy at Palmer Station.  Scientists at the station rely on zodiacs for sampling; those ubiquitous inflatable craft that, while surprisingly durable, are pretty useless in even very light ice conditions.

So one thing we would very much like to know is what the ice conditions will be like in mid October when we arrive on station.  A good place to start a discussion of likely ice conditions is, surprisingly, the tropical Pacific.  The tropical Pacific is a mess right now.  There’s a tremendous amount of heat in the surface ocean and cyclones have been pinging around the South Pacific for the last few weeks like a bunch of bumper cars.  This is the result of a strong El Nino taking shape, possibly the strongest we’ve seen in over a decade.

From www.weatherunderground.org. Sea surface temperature anomaly on August 4, 2015. Warm areas indicate sea surface temperatures that are warmer than normal. Notice the large amount of heat in the Pacific, particularly along the equator extending out from Ecuador. This is the result of suppressed upwelling and will probably, among other things, lead to a very poor anchovy catch in Peru this year...

From www.weatherunderground.org. Sea surface temperature anomaly on August 4, 2015. Warm areas indicate sea surface temperatures that are warmer than normal. Notice the large amount of heat in the Pacific, particularly along the equator extending out from Ecuador. This is the result of suppressed upwelling and will probably, among other things, lead to a very poor anchovy catch in Peru this year… Note:  When I published this post I thought I had dropped this image in as a static image, instead it posted as a link.  The (static) image shown is from a few days after this post was published.

The El Nino Southern Oscillation (ENSO) is a tropical phenomenon with global consequences.  One of these is reduced sea ice extent along the West Antarctic Peninsula (WAP).  During an El Nino the polar jet (the analogous to the northern hemisphere’s jet stream) is weakened and there is less transport of heat from the subtropical Pacific to the WAP.  During La Niña the opposite happens; the jet strengthens, driving warm, wet storms south across the Southern Ocean to the WAP.  The strong winds break up the ice and the heavy snow and rain has a pretty bad effect on Adélie penguin chicks, occasionally causing total breeding failures.

ENSO’s not the only pattern of climate variability with an impact on Antarctic sea ice, however.  The Southern Annual Mode (SAM), also called the Antarctic Oscillation (AAO), has a major impact on sea ice extent.  Unlike ENSO, which is the result of complex dynamics between the atmosphere and the ocean, SAM is primarily an atmospheric phenomenon linked to the magnitude of the north/south pressure gradient across the Southern Ocean.  This differential controls the strength of westerly winds that help deliver subtropical heat to the West Antarctic.  SAM has two phases; positive and negative, and can hold one phase for weeks or months, then suddenly shift.  During its negative phase SAM is correlated with reduced westerly winds and increased sea ice along the WAP (and happy penguins).

Right now SAM is in a positive phase, and has been for some time.  But we’ve also got an uber El Nino.  So what does that mean?  I asked Sharon Stammerjohn, a physical oceanographer with the Palmer LTER project, what happens in these situation.  Sharon and several colleagues wrote a paper in 2008 exploring the impact of SAM and ENSO on Antarctic sea ice extent.  It’s pretty clear what happens if a La Niña lines up with a positive SAM (low ice year), or an El Nino coincides with a negative SAM (high ice year).  But what about a positive SAM and a strong El Nino?  In that case it can, apparently, go either way.  Either the strong subtropical storm effect will overcome the weakened polar jet or it won’t.  To get a sense of which is winning so far this year we can take a look at the the NSIDC’s current map of Antarctic sea ice extent.

s_extn

Taken from http://nsidc.org/data/seaice_index/. Antarctic sea ice extent for July 2015. Right now it’s shaping up to be a big ice year for the West Antarctic Peninsula.

Ouch, take a look at the northern tip of the WAP.  The pink line is, as the figure indicates, the median sea ice edge.  Clearly El Nino is winning; most of Antarctica is normalish, but the WAP region has some extraordinary ice cover right now.  Sea ice has been more or less on the decline there for the last couple of decades, it will be very interesting to see what kind of impact this has on the ice-dependent WAP ecosystem.  Of course sea ice that far north is pretty fickle; the SAM could switch modes, or the westerlies could increase, and the sea ice could breakup and move out before spring.  Otherwise it looks like we might be sampling from the Palmer dock…

Posted in Palmer 2015 field season | Leave a comment