Sea ice biogeochemistry postdoc position

The author takes a break from skiing on the slopes above the British Antarctic Survey (BAS) Rothera Station (out of frame to the right).  In contrast to the USAP, the BAS encourages outdoor education and recreation among visiting scientists to facilitate safer and more efficient fieldwork.

This could be you!  The author takes a break from skiing on the slopes above the British Antarctic Survey (BAS) Rothera Station (out of frame to the right). In contrast to the USAP, the BAS encourages outdoor education and recreation among visiting scientists to facilitate safer and more efficient fieldwork.

There’s a great sea ice biogeochemistry/microbiology postdoc position open in Jaqueline Stefels group at the University of Groningen in the Netherlands.  Jaqueline is one of the leading experts on the marine sulfur cycle and the project focuses on the DMS/DMSP dynamics within sea ice (see this post).  The position includes fieldwork at Rothera Station, one of the most idyllic places in Antarctica (all the scenery of Palmer Station with none of the silliness of the USAP).  Position announcement embedded below.

https://www.polarmicrobes.org/wp-content/uploads/2015/05/postdoc_position_UnivGroningen_NL-2.pdf

Posted in Uncategorized | Leave a comment

A tale of two studies: Big Data and a cautionary analogy

One of the chapters of my dissertation was just rejected for publication a second time.  I haven’t decided what to do with it yet, but in the meantime it makes an interesting case study in rejection.  In brief, what we tried to do in this paper was make a large scale statistical inference of amino acid composition in a taxonomically controlled group of psychrophiles (cold adapted bacteria) and mesophiles (room temperature adapted bacteria).  A protein derives its physical characteristics (structure, flexibility, substrate specificity, etc.) from the physiochemical properties of each amino acid (size, charge, solubility in water, etc.).  As proteins in psychrophiles evolve it is therefor reasonable to expect that they will preference certain amino acids associated with the structural modifications most likely to enhance protein function at low temperature.  Because protein evolution, like all evolution, is a randomized process, we expect differences in amino acid composition between the two groups to be small and messy, but identifiable with a large enough sample size.

A fair bit of previous work has been done in this area, including Metpally and Reddy, 2009.  There are, however, some issues with that analysis.  Metpally and Reddy used a small number of predicted proteomes from psychrophiles and mesophiles (it was 2009, after all), and did not control for multiple comparisons in the large number of Students t-tests that they performed between these two groups.  We hoped to improve on this with a larger, carefully controlled set of predicted proteomes and a more robust statistical approach.

The idea that differences in amino acid composition reflect adaptation to different thermal regimes is not popular in all circles.  In our first submission we received two reviews, one positive and one negative.  The negative review came from a protein chemist who does very high resolution dynamical studies of proteins as they interact with their substrates.  He noted that it takes only a single amino acid to effect a major change in, say, the flexibility of a protein and thus its function under different temperatures.  This is true, but I would counter that evolution doesn’t know to target a specific high-payoff residue for mutation.  This process is random; every now and again a residue in a key dynamical position will get modified in a way that improves protein function.  Much more often residues in less key positions will mutate in a way that helps just a little bit.  We would expect to find a consistent pattern in amino acid substitutions across psychrophile proteins as a result of these mutations.

After some minor changes we resubmitted the manuscript to a different journal.  I give the second editor much credit for going the distance and ultimately soliciting five different reviews.  The reviews were roughly 3 in favor and 2 against; usually one dissenter is enough to prevent publication.  One of the objections raised, and the subject of this post, was our use of the t-test and p-value to describe the statistical significance of differences in amino acid composition between psychrophiles and mesophiles.  I usually think of large sample size as a good thing as it allows the detection of correlations that would otherwise be masked by noise or other effects.  In this case, however, the large number of proteins in our dataset seems to have gotten us into some trouble.

We used predicted proteomes from 19 psychrophiles and 19 mesophiles, roughly 130,000 proteins total.  We made over 100 tests for differences in parameters (such as amino acid composition) between the two groups, applying the Holm-Bonferroni method to control for the multiple comparisons.  After control we had some number of parameters that exceeded our (Holm-Bonferroni corrected) threshold for significance.  A t-test of serine content, for example, yielded a p-value of essentially 0.  The mean proportion of serine, however, was not that different between the groups, at 0.0658 for the psychrophiles and 0.0636 for the mesophiles.  We considered these slight preferences as consistent with our ideas of protein evolution and reported them.

To quote an anonymous reviewer: “Any inferential statistics test, being it parametric or non-parametric, reaches any desired statistical significance level at increasing N (sample size). This makes the issue of biological relevance drastically separated from the problem of statistical significance (any difference between two samples becomes significant at increasing N)“.  The reviewer is concerned that the very small differences that we observed, while statistically significant, are not biologically relevant.  They go on to recommend the treatment of this issue in Kraemer, 1992.  That paper, which draws on a hypothetical medical study using a t-test between a treatment and control group, makes some interesting points.  I’ll attempt to reconstruct what they do in the context of differences in serine content between the two groups, and end with what I think it means for our study and what we might do about it.

First we can read in some data and take a look at the distributions (if you want to play with the code just use rnorm to generate some normally distributed data).

serine <- read.table('serine.txt.gz', header = F)
serine_treat <- serine[which(serine[,1] == 'cold'),2]
serine_control <- serine[which(serine[,1] == 'control'),2]

## look at the distributions

treat_hist <- hist(serine_treat, breaks = 100)
control_hist <- hist(serine_control, breaks = 100)

plot(control_hist$counts ~ control_hist$mids,
     col = 'red',
     type = 'l',
     ylab = 'Count',
     xlab = 'Serine composition')

points(treat_hist$counts ~ treat_hist$mids,
       type = 'l')

lines(c(mean(serine_treat), mean(serine_treat)),
       c(-1000,8000),
      lty = 2)

lines(c(mean(serine_control), mean(serine_control)),
      c(-1000,8000),
      col = 'red',
      lty = 2)

legend('topright',
       legend = c('Treatment', 'Control', 'Treatment mean', 'Control mean'),
       col = c('black', 'red', 'black', 'red'),
       lwd = 1,
       lty = c(1,1,2,2))

That produces the following plot.  Notice how close the two means are.  They are very statistically significantly different by the t-test, but is this biologically meaningful?  The Kraemer paper uses the term effect size to describe biological impact.

Rplot

Central to the Kraemer ideas of effect size is a plot of failure rate (in our analysis the fraction of the control group that is above a given value of the control group) as a function of the control values.  The more different the failure rates for the treatment and control, the greater the effect size (the more biologically meaningful the result, or so the thinking goes).

## failure rate analysis
lc <- length(serine_control)
lt <- length(serine_treat)

sr <- matrix(ncol = 3, nrow = lt)

i <- 0
for(st in sort(serine_treat)){
  i <- i + 1
  print(i)
  fr_sc <- length(which(serine_control < st)) / lc
  fr_st <- length(which(serine_treat < st)) / lt
  
  sr[i,] <- c(st, fr_sc, fr_st)
}

plot(sr[,3] ~ sr[,1],
     ylab = 'Failure rate',
     xlab = 'Response value',
     type = 'l')

points(sr[,2] ~ sr[,1],
       col = 'red',
       type = 'l')

legend('bottomright',
       legend = c('Treatment', 'Control'),
       col = c('black', 'red'),
       lwd = 1)

And that, in turn, produces the following plot of failure rate.

Rplot02

Note that treatment value is synonymous with serine composition.  I’m trying to keep the nomenclature somewhat consistent with Kraemer, 1992.  From this plot we can see that the failure rate is pretty similar for a given treatment value for the two groups, but not exactly the same.  This can be explored further as a plot of failure rate difference as a function of treatment value.

plot((sr[,2] - sr[,3]) ~ sr[,1],
     type = 'l',
     xlab = 'Treatment value',
     ylab = 'Delta failure rate')

Rplot03

I interpret this plot as showing that, for values near the mean treatment value, the difference in serine composition is greatest.  Biologically that makes sense to me.  We don’t know what is happening at extremely high and low serine concentrations, but those are presumably special cases.  Typical proteins act as we expect, although the difference is small.  What does that mean for the effect size?

One measure of the effect size is the raw mean difference, which we know to be ~0.002.  The raw mean difference can also be observed on the plot of failure rate as a function of treatment value, as the difference along the x axis between the two lines at 0.5 on the y axis.  The problem with using this value a measure of effect size is that it is blind to variance within each sample set.  To take this into account we can instead use the standardized mean difference, calculated as such:

## calculate standard mean distance
ser_smd <- (mean(serine_treat) - mean(serine_control)) / sd(serine_control)

In this analysis the standard mean distance is 0.100.  The Kraimer paper gets a little arm wavy at this point with the admission that the selection of an appropriate standard mean distance is arbitrary.  Other work is cited that suggests values of 0.2, 0.5, 0.8 as “small”, “medium”, and “large” differences.  It’s trivial to work out what difference in mean serine composition would yield a standard mean distance of 0.5:

## calculate the difference in means that would give
## a standard mean distance of 0.5
delta_m <- 0.5 * sd(serine_control)

That works out to 0.011, roughly a five-fold increase over our observed mean distance of 0.002.  The conceptual problem that I’m having, however, is how appropriate it is to apply these stringent criteria to the problem at hand.  These methods were optimized for clinical studies, and a stated goal of Kraemer, 1992 is to help determine when it is good policy to recommend a treatment.  It is desirable to know that the proposed treatment produces a large improvement in the majority of patients before it is adopted as general practice.  That is very different from what we are trying to do and I don’t think the concept of effect size can be treated the same in each case.  I propose the following analogy to our study:

Suppose you own a house with a front lawn and a back lawn.  You suspect that one grows faster than the other and would like to test this hypothesis.  You mow the lawns on the same day, and a week later you carefully measure the length of 10,000 randomly selected blades from each lawn.  You find that the lengths are normally distributed, with a mean of 10.00 cm in the front lawn and 10.35 cm in the back, with a standard deviation of 3.5 cm.  These values are proportional to our serine values and would yield the same p-value.  Would it be appropriate to say that the back lawn is growing faster?  I believe that the answer is yes; the p-value shows that the probability that this happened by chance is vanishingly small.  If my goal was to grow grass faster, and I had spent some considerable effort or money fertilizing my back lawn, I would probably not use these numbers to justify continued effort (i.e. the effect size is small).  If, however, my goal was to understand the ecology of my yard, such as the impact of sun, water, dogs, kids, etc. on the amount of biomass in my yard, I would definitely take note of these numbers.

We would like to know the effect of evolved adaptation to cold environments on the amino acid composition of various proteins.  We know the effect will be small, but we must keep in mind that this is an ecological question that requires an ecological* frame of mind.

So what to do next?  Assuming that I can carve out time to redo this analysis I think I can make a more convincing case by counting the number of psychrophile-mesophile homologous pairs of proteins for which the psychrophile has a greater serine content.  We could then apply a paired t-test and hopefully bolsters the case that the observed signal, while small, is not there by chance.  The issue of effect size will have to be dealt with in an improved discussion – far more terrifying than the thought of further analysis.

Got a thought about this problem or an issue with my interpretation of Kraemer?  Leave a comment!

*I might be taking too much liberty with ecological here.  Consider a gene expression study (for example), very much an ecological exercise.  In the case of determining interesting differences in the rate of gene expression I think the Kraemer methodology would be very appropriate.

Posted in Research | 1 Comment

Antarctic frost flower paper submitted

Moments ago I finally submitted the final frost flower paper from my PhD, a paper which first found life as Appendix 2 in my dissertation.  Once upon a time I thought this particular project was going to be the central pillar of my doctoral work.  It didn’t quite work out that way, but I’m really glad to see this manuscript go off into the world.  I wish it well.

Most of the frost flower and young sea ice work that I conducted with my adviser, Jody Deming at the University of Washington, took place in the Arctic.  Starting all the way back in 2009 (and thanks to collaborators in the OASIS project – a collaboration that, in retrospect, we should have worked harder to maintain) we sampled frost flowers and young sea ice during the winter and spring near Barrow, Alaska.  Near Barrow an offshore polynya, part of the circumpolar flaw-lead system, provides access to a region of ice formation all through the winter.  The fickle nature of the polynya and issues with access and timing however, lead us to consider other possibilities.  At about the same time I got interested in a series of papers suggesting that frost flowers could be the source of sulfate depleted sea salts in coastal Antarctic glaciers.  Repeatedly skunked on Barrow sampling trips and with the pressure mounting to develop a thesis topic I hatched a plan to develop a biological side to this sea salt story.  If frost flowers were responsible for transporting salt to the Antarctic interior, and our work had demonstrated that frost flowers were enriched in bacteria, it stood to reason that bacteria would be transported as well.  This could link the marine and glacial microbial environments in a previously unexplored fashion, with implications for gene flow to and microbial habitation of the expansive glacial environment.

We proposed all this to NSF and received partial support to conduct a pilot study (see NSF award 1043265).  Our efforts the following (2011) field season are well documented in the earliest entries on this blog, starting here.  After major bureaucratic delays (the McMurdo Station personnel were very reluctant to allow sampling in the young sea ice zone), blizzards, injuries, mishaps involving Weddell seals, and various other trials and humiliations that I’ve probably purged from my memory we emerged from Antarctica with samples of ice from the surface of Taylor and Wilson-Piedmont Glaciers, snow, frost flowers, newly formed sea ice, and seawater.  Our plan was to sequence the 16S rRNA gene from DNA extracted from these samples (a standard way of describing community composition and structure in microbial ecology) and look for overlaps.  Abundant frost flower microbes that also appeared in glacial ice, for example, would indicate that our proposed transport mechanism was active.

Collecting frost flower and young sea ice from Lewis Bay on the north side of Ross Island in October of 2011.

The author collecting frost flowers and young sea ice from Lewis Bay on the north side of Ross Island, October of 2011.

Our first sequencing effort was a complete disaster.  Weeks of work went down the drain with our primary samples.  It still isn’t clear where things went wrong; in our lab, at the (nameless if not blameless) sequencing center, or both.  My money’s on both.  Thankfully we had a limited set of backup samples and just enough funding left to try one more time.  This time we switched from the 454 to the Illumina sequencing platform, and placed our samples in the capable hands of the sequencing center at the Argonne National Lab.  A few weeks later we had data.  By this time, however, I’d moved on to other things and was cramming to finish the existing chapters of my dissertation.  Such is the way things go.  I had just enough time to do an initial workup and staple it to the back of my dissertation as an appendix so that it would at least exist somewhere if I never got back around to it.

Fortunately I have a couple of frightening manuscript deadlines coming up, and there is no better way to deal with an impending deadline than to ignore it completely and find something else to do.  In this case reforming the half-baked appendix into a proper manuscript (and then writing a blog article about it) was the perfect avoidance mechanism (tomorrow I’m on it).

With the back story complete, what did we find in the Antarctic frost flowers?  Definitely not what we initially expected.  We found very little evidence of marine bacteria being deposited and preserved in either glacial ice or snow (this doesn’t mean they aren’t transported there, they probably just don’t survive the trip or last long when they get there).  We did, however, find a significant number of terrestrial bacteria in frost flowers samples, particularly cyanobacteria of the genus Pseudanabaena (commonly found in melt pools on the surface of glaciers) and an add assortment of non-marine sulfur oxidizers.  We think that all of these are the result of wind-driven transport from land the the ice surface, with the latter coming from the sulfur-rich environs around volcanic Mt. Erebus.  What all this means for the microbial ecology of the sea ice surface is not clear.  Armed only with 16S rRNA gene sequences we can’t do much more than speculate.  Further work will have to be done to see if these bacteria are doing anything of interest at the ice surface.

It is worth noting that we have now published community structure data from frost flowers from three different environments, and that the story behind each is very different.  In our earliest efforts at Barrow we collected frost flowers from a highly productive coastal system and found a strange assortment of putatively marine Rhizobiales.  We went to some pretty great lengths in a 2014 paper to demonstrate that these don’t look like terrestrial Rhizobiales deposited by wind, and at any rate the Barrow coastline was pretty well covered with snow when we were there.  In later efforts at Daneborg in Greenland, Jody Deming and a team of collaborators collected frost flowers from a highly oligotrophic fjord.  We found that the microbial community in these looked pretty much like the community in seawater.  Last, we know have these odd frost flowers from off Ross Island in Antarctica, which appear to contain some very interesting bacteria from various adjacent environments.  So the overall story seems to be that the sea ice surface – a warmer and more chemically active environment than one might think – can harbor a diverse array of bacteria.  Which bacteria and from where depends heavily on the dynamics of the surrounding area, including what’s happening in the water when the ice forms, the extent of snow cover, wind magnitude and direction, and probably some things we don’t know about yet!

For my postdoc I’m firmly out of young sea ice and into the water column, looking at microbial processes around the West Antarctic Peninsula.  Someday, however, it will be good to get back around to young sea ice.  If published this will be our fifth paper on the subject and I feel like we hardly have any answers!

Posted in Research | Leave a comment

70 Degrees South film tour

I only just learned about 70 Degrees South, a documentary film project funded by Rutgers University and the National Science Foundation.  The documentary features the science and scenery of the Palmer Long Term Ecological Research Project.  In support of the project, each January (the height of the Austral summer) the research vessel Laurence M. Gould reoccupies a series of stations along the central West Antarctic Peninsula.  Reoccupying grid points might not sound too exciting, but the WAP is a fantastic and rugged place to do science.  Sea ice blows in and out, storms charge in out of the Southern Ocean, wildlife is all around.  The film crew went down the year before I was on the Gould, so I missed my 15 minutes of fame, but I’ve heard that the film is worth checking out.  I had to chuckle at a few points in the description; I don’t think anyone’s called the Laurence M. Gould a “world class icebreaker” before (it doesn’t even make the grade for the coastguard’s list of major icebreakers), and saying the science team is equipped with “an arsenal of cutting edge technology” might be a stretch.  Antarctic science is typically conducted on a shoestring budget, and this cruise is no exception.

The film is now screening at a few points across the country.  Check it out and support Antarctic science!

Princeton, NJ    Princeton Environmental Film Festival    March 24, 2015
Minneapolis, MN    Minneapolis Film Festival    April 9 – 25, 2015
New York, NY    Quad Cinema    Opens April 17, 2015
Missoula, MT    International Wildlife Film Festival    April 18 – 25, 2015
Los Angeles, CA    Laemmle Music Hall    Opens May 15, 2015

Posted in Uncategorized | Leave a comment

Polar science on Reddit

I’m a member of the Association of Polar Early Career Researchers (APECS), an international group of young(ish) scientists that tries to raise awareness on Polar issues.  In recognition of International Polar Week the US branch of APECS, US-APECS, is going to try some experimental outreach tonight using Reddit.  Via the iama feed we’re going to do a realtime, public Q&A on any polar science topic.  Join us on Reddit at 8:30 pm ET at #PolarWeek!  We will be online for about an hour but will still respond to questions posted after 9:30.

In addition to myself the experts will be:

Dr. Chelsea Thompson, atmospheric chemistry and air/snow interactions, CU Boulder

Dr. Ellyn Enderlin, glaciology, glacier-ocean interactions, University of Maine

Dr. Alex Thornton, Marine Ecology, Scripps Institute of Oceanography

Posted in Uncategorized | 1 Comment

Some climate fun with Google Earth

My postdoctoral adviser at Lamont showed me something really cool today that I suspect is pretty common knowledge (it was new to me!).  Google Earth has a feature that allows you to view historic aerial/satellite photos for any point on Earth, or at least for any place that was interesting enough to someone in the past that they took a photo of it.

Fortunately one of those places is Palmer Station, Antarctica, home to a small US research base and a really large glacier.  There are photos of the site available from 1963, 1975, 2004, 2008, and 2013.  What is as remarkable as it is expected is the dramatic retreat of the glacier.  Using nothing other than the ruler tool in Glacier Earth anyone can work up a neat little dataset of glacial retreat.  Here are the photos from each year, in order:

1963 1975 2004 2008 2013

That’s a pretty dramatic change.  The yellow line is my ruler, I tried to measure along the same aspect to a well defined point along the glacier’s tongue each year.  Obviously it’s a little bit subjective.  We can take this a step further and plot out the values:

distance

Never mind the Excel plot. It’s too late for R.  The big data gap between 1975 and 2004 is unfortunate; there’s a photo for 1999 but it’s not useable.  The data shows that the glacier has retreated nearly half a kilometer from Palmer Station in 50 years.  That’s not a particularly dramatic glacial retreat relative to some but still impressive to see.  More importantly the rate of retreat is increasing, up to around 15 m per year from 5 m per year at the start of the record.

Of course all of this has implications for the ecosystem around Palmer Station.  The Palmer Long Term Ecological Research project has documented a big increase in the proportion of glacial runoff in coastal seawater along the West Antarctic Peninsula.  This increase is linked to changes in phytoplankton community structure, the base of a short and tightly coupled foodweb.  You can read more about that here.

*** UPDATE ***

Hugh Ducklow, my adviser here at Lamont, gave me permission to post the following images and further explanation.  He’d found the images after plotting, in Google Earth, the locations of some temperature data recorders placed in the soil in a line between Palmer Station and the glacier in the Austral summer of 2014.  The location of the temperature probes can be seen in the following images.  It is, of course, important to remember that the glacier retreats in fits and starts (something that can be seen in the data above).  Years of stability can be followed by catastrophic melt, and the other way around.  Here are the years 1975, 2004, and 2013, with the location of the loggers superimposed.

glacier1 glacier3
glacier2

 

Posted in Research | Leave a comment

Dimension reduction with PCA

I recently had an interesting problem that I think I successfully solved using principal component analysis (PCA).  PCA is one of a family of multivariate statistical methods known as ordinations.  Frankly it, and related techniques, scare me a little.  They’re just a little too  easy to implement in your favorite statistical workspace without any idea how to interpret the output.  Ideally one would spend some time under the hood, getting comfortable with the inner workings of the method before applying it.  I’m not the best at symbolic reasoning, so in these situations I usually rely on some application examples to help me make the jump from the stats book to my analysis.  Often, however, I’m trying to do something a little bizarre for which I can’t find a great guiding example.  That was how my week started, but I think it ended up in a pretty good place.

My problem breaks down like this.  I’ve got 2,700 observations (genomes) described by 3.2 million variables (never mind what the variables are at the moment – could be anything).  I want to calculate the Bray-Curtis distance between these genomes.  I’ve got 32 Gb of RAM, performing distance matrix calculations on a 9 billion cell matrix isn’t going to happen (I tried anyway, using R and in Python using pdist from scipy.spatial, fail).  But what if I don’t need all 3.2 million variables?  It’s reasonable to assume that many, probably the vast majority, of these variables aren’t contributing much to the variance between genomes.  If I can get rid of these superfluous variables I’ll have a matrix small enough to perform the distance calculations.

Enter PCA.  PCA essentially generates artificial variables (axis – or principal components) to describe the variance between samples.  The original variables can be described in terms of their magnitude, or contribution to each principal component.  The principal components are ordered by how much variance they account for.  In ecology you frequently see plots of the first two principal components (the two that account for the largest variance), with arrows depicting vectors that describe the magnitude of the true variables along both principal components.

In PCA you end up with as many PCs as you had samples to start with.  Some of those principal components aren’t describing much in the way of variance.  You can evaluate this with a scree plot.  Here’s a scree plot for the 200 PCs in my analysis:

skree1

There are two inflection points on this plot.  Normally you would be able to ignore all but the first few principal components (i.e., before the first inflection point), but in this case that isn’t accounting for much variance – maybe 30 %.  I don’t think there’s any harm done by retaining additional PCs, other than a larger matrix size, so I chose to use all PCs to the left of the second inflection point, marked with the vertical line.  That accounts for about 90 % of the variance.  Note that this hasn’t done anything to reduce the number of variables in the original analysis, however.

For that I needed to evaluate a second scree plot, of the sum of the magnitudes for each variable, for the PCs that I elected to keep.  In my case the matrix columns are PCs, the rows are variables, so I’m simply summing across the rows.  Before doing this I normalize each magnitude by multiplying it by the proportion of variance accounted for by the relevant principal component (note that magnitude = absolute value of the variable).  Here’s the second scree plot:

scree2

Yes, this is a messy plot, it’s getting very late on a Friday evening…

 

Again, there’s an obvious inflection point marked by the vertical line.  It suggests that there are 100,000 variables that are most influential on the PCs that account for most of the variability.  In short, I can use these 100,000 variables and ditch the other 3.1 million.  Are distance matrices calculated from both the full and partial set of variables comparable?  As the plot below shows, they aren’t quite linearly correlated, but there is very good agreement between the two calculations (R = 0.97).

scatter

One final note on this. I ran the PCA and distance matrix calculations in both R and Python.  In Python I used scipy.spatial.pdist for the distance matrix and matplotlib.mlab.PCA for PCA.  In R I used prcomp and vegdist.  Anecdotally the memory utilization seemed to be quite a bit lower with Python, though I wasn’t watching it that closely.

Posted in Research | Leave a comment

Got the reviews blues

I’ve got this cool idea
Think about it everyday
Put it in a proposal
Then NSF took my idea awaaaay
I’ve got the review blues…

We got some surprising reviews back yesterday on a proposal to NSF Polar Programs.  We weren’t funded.  No surprise there, the proposal funding rate across NSF is about 25 %, though I’m not sure of the exact number for the program we submitted to*.  The project that we proposed was pretty ambitious; the surprise was in the positive nature of most of the reviews.  As befitting the overseer of most of the nation’s basic research dollars, and as described in their merit review criteria, NSF relies on a highly scientific, quantitative scoring system consisting of “very poor”, “poor”, “fair”, “good”, and “excellent”.  “Fair” means your proposal was really bad.  “Good” means a reviewer really didn’t like something.  Many reviewers seem reluctant to be overly complementary with “excellent”, and too disparaging with “good”, and settle for the ambiguous “very good”.  Here’s how we stacked up:

Reviewer 1: Very good, good – (i.e. 25 % of the way between good and excellent?)

Reviewer 2: Very good

Reviewer 3: Very good

Reviewer 4: Good 🙁

Reviewer 5: Very good

Reviewer 6: Very good

Reviewer 7: Excellent (woohoo!)

Reviewer 8: Very good

We got some really good feedback from the reviewers (thanks to all of you!), and thankfully I’m on fellowship – so no need to break out the emergency ramen noodle ration.  Proposals for the next Polar Programs call are due in April, time to get busy with a rewrite…

 

*Using hyper-advanced NSF math, an abstraction similar to congressional budget math, NSF claims that, on average, each investigator receives an award for every 2.3 that they submit.  I’m probably missing something, but it isn’t clear to me how to reconcile that number with the low proposal funding rate.

Posted in Uncategorized | Leave a comment

Sea ice biogeochemistry methods review

I was excited to learn that our long awaited sea ice biogeochemistry review was published today in the journal Elementa.  Many of the contributors to the project are members of the SCOR working group Biogeochemical Exchange Processes at Sea Ice Interfaces (BEPSII).  The review anchors a special edition of Elementa, titled after the working group and slated to appear later this year.

The topics covered by the review reflect the authors list, that is, it’s a pretty mixed bag.  There’s something in there for just about everyone; topics range from eddy covariance measurements above sea ice, to standard T/S measurements in sea ice, to nucleic acid extraction from sea ice.  The latter bit was my contribution and meshes well with a review I’m authoring for the Elementa special edition on the link between the taxonomy of the sea ice microbial community and likely biogeochemistry-influencing metabolisms in sea ice.  When putting the section together I was actually surprised at how few studies have undertaken molecular analyses of the sea ice microbial community.  I’ve been using molecular methods on sea ice since 2009, and have cited these various studies extensively, but I never actually sat down and listed them out before.  On one hand its nice to know that so much remains unknown, on the other hand its a little alarming that, given the advanced state of the field in other environments, molecular biology has been so slow to gain traction on sea ice (no pun intended – sea ice isn’t really that slick).

At any rate, thanks to lead author Lisa Miller for pushing this behemoth through and thanks to the BEPSII working group for letting a junior scientist(s) take a role!

Posted in Research | Leave a comment

DHPS: Taking a dip in the DOC pool

An interesting paper was recently published by Durham et al. in PNAS: Cryptic carbon and sulfur cycling between surface ocean plankton.  The all-star author list includes members of the Armbrust Lab at the University of Washington, the Moran Lab at the University of Georgia, Athens, and the Van Mooy Lab at WHOI.

For a long time a compound known as dimethylsulfoniopropionate (DMSP) has been recognized as a major component of both the reduced sulfur and dissolved organic carbon (DOC) pools in the ocean.  As described here, DMSP is produced by phytoplankton in response to various environmental stresses, and is readily consumed by many marine bacteria (some bacteria produce DMS, a climatically active volatile, as a byproduct of this consumption).  In this way it is an excellent example of the microbial loop; dissolved DMSP cannot be directly consumed by zooplankton, but it can be consumed by higher trophic levels after repackaging in microbial biomass.  Presumably, however, DMSP is only one of many important components of the DOC pool.  Determining the other major components turns out to be a rather difficult task.  There are two ways a researcher might go about this.

Pages from Cryptic carbon and sulfur cycling between surface ocean plankton

Figure from Durham et al., 2014 and shows the pathway for the intracellular catabolism of DHPS by R. pomeroyi, ending with the excretion of bisulfite.

First, a chemist might try and analyze the different compounds in seawater and determine, structurally, which are likely to be consumed by marine bacteria.  The second bit is relatively easy, but isolating any compound from the salty, organically complex milieu that is seawater is not.  Only through the application of expensive and time consuming methods can a targeted compound by isolated, and it is necessary to know in advance what the target compound is.

Second, a biologist might try to analyze the genetic composition or gene expression pattern of marine bacteria to determine what enzymes the bacteria are using, or have the capacity to use.  An abundance of transporters specific to a certain compound, or class of compounds, for example, suggests that this compound is, for that bacteria, an important substrate.  Unfortunately determining which enzymes are specific to what compounds is almost as difficult as the bulk analysis of carbon compounds in seawater.  Most of what we know about microbial enzyme specificity comes from studies using E. coli, which may or may not have much to do with what bacteria are doing in the marine environment.  To be absolutely certain about the role of any enzyme it is necessary to perform costly and tedious experiments with a laboratory isolate, a requirement that eliminates the vast, unculturable majority of marine bacteria from any potential analysis.

Durham et al. waded into this particular quagmire with both approaches.  Working with a model system composed of one diatom, Thalassiosira pseudonana, and one bacterium, Roseobacter pomeroyi, they compared the gene expression profiles of bacteria grown in the presence of the diatom with profiles from pure cultures.  Surprisingly, genes strongly upregulated in the presence of the diatom included transporters for the reduced organo-sulfur compound C3-sulfonate 2,3-dihyhdroxypropane-1-sulfonate (DHPS), and genes coding for hydrogenase enzymes involved in DHPS catabolism, including the gene hpsN.  Taking this hint the researchers were able to grow R. pomeroyi on DHPS as the sole carbon source, observing gene expression profile similar to when R. pomeroyi was grown in the presence of the diatom.  Because DHPS was not a component of the media used in the initial experiment it must have been produced by the diatom, and clearly it is an acceptable growth substrate for R. pomeroyi.  But how important is it in the natural environment?

With a target compound and a target gene in hand, Durham et al. were able to quantify DHPS in seawater during a cruise in the North Pacific, while simultaneously collecting metatranscriptomes and metagenomes.  They observed an abundance of hpsN transcripts coincident with diatom cell counts and high DHPS concentrations.  At coastal stations the concentration of DHPS actually exceeded that of DMSP.  This is a great paper, but is just a first look at a story that is sure to have geographic and taxonomic complexity.  I’m sure we’ll be hearing a lot more about DHPS (and other, yet to be revealed components of the DOC pool) in the coming years.

Posted in Research | Leave a comment