The Bowman Lab

New paper on microbial community dynamics in up-flow bioreactors

Posted on December 4, 2020 by Jeff

Congrats to Avishek Dutta for his first publication in the Bowman Lab: Understanding Microbial Community Dynamics in Up-Flow Bioreactors to Improve Mitigation Strategies for Oil Souring. Avishek did a remarkable job of resurrecting a stagnant dataset and turning it into a compelling story.

What is oil souring anyway? When oil is extracted in a production field the pressure of the field drops over time. To keep the pressure up the oil company (or more accurately, the subsidiary of the subsidiary tasked with such things) pumps in water, which is frequently seawater. When the water comes back out through wells there are two options. The most economical thing is to release it back into the environment, which has obvious negative consequences for environmental health. Alternatively, the same water can be reused by pumping it back into the ground. The downside to this is that recycled well water typically induces the production of hydrogen sulfide by sulfate reducing bacteria. The hydrogen sulfide reacts with the oil (“souring” it) and creates its own set of environmental and occupational hazards.

The oil industry has spent quite a bit of effort trying to figure out how to mitigate the process of oil souring. A leading method is to introduce nitrate salts into the system to boost the growth of nitrate reducing bacteria. The nitrate reducers out compete the sulfate reducers for reduced carbon (such as volatile fatty acids) and induce other processes that further impeded sulfate reduction.

Although the basic concept is pretty simple the details of this competition between nitrate and sulfate reducers in oil field aquifers is not well understood. In this study Avishek leveraged samples from up-flow bioreactors, analogs of the oil field aquifer system, for 16S rRNA gene analysis of the bacterial and archaeal communities. The bioreactors are vessels filled with sand, seawater, and sources of bioavailable carbon (the oil itself is a source of carbon, but requires a specialized microbial community to degrade). Some of the bioreactors also contain oil. Water flows continually through the system and nitrate salts can be added at appropriate time-points. For this experiment the nitrate amendment (the mitigation or M phase) was halted (the rebound sulfidogenesis or SG phase) and then restarted (the rebound control or RC phase).

From Dutta et al., 2020. Relative abundance of top taxa in during different phases: mitigation (M), rebound control (RC), rebound sulfidogenesis (RS).

Lots of interesting things emerged from this relatively small-scale experiment. For one thing the oil and seawater samples are not that different from one another during mitigation. However, when the nitrate addition is stopped those two treatments start to diverge, with different sulfate reducing taxa present in each. This divergence (but not necessarily the microbial community) persists after the treatment ends. But not all microbial taxa responded to the rather extreme perturbation caused by the nitrate addition. Desulfobacula toluolica, for example, which should have been out-competed during mitigation, remained a significant member of the community.

We’re currently analyzing the results of a much larger bioreactor study that we expect will shed some new light on these processes, so stay tuned!

Posted in Research | Leave a comment

New paper linking the SCCOOS and AGAGE datasets

Posted on October 12, 2020 by Jeff

Postdoctoral researcher Jesse Wilson has a new paper titled Using empirical dynamic modeling to assess relationships between atmospheric trace gases and eukaryotic phytoplankton populations in coastal Southern California in press in the journal Marine Chemistry. This paper is the culmination of a nearly two year effort to bring together two long-term datasets collected at the Ellen Browning Scripps Memorial Pier: the Southern California Coastal Ocean Observing System (SCCOOS) phytoplankton count and the Advanced Global Atmospheric trace Gas Experiment (AGAGE). Both of these programs encompass many more sites than the Scripps Pier, but that’s the happy point of overlap. The SCCOOS phytoplankton count (augmented by the McGowan chlorophyll time-series) is part of an effort to track potential HAB-forming phytoplankton in Southern California. Twice weekly microscope counts are made of key phytoplankton taxa, and weekly measurements are made for chlorophyll a and nutrients. AGAGE is, as the name suggests, a global effort to monitor changes in atmospheric trace gases. They do this using high frequency measurements of key gases with GC-MS and a cryo-concentration system known as Medusa.

Our study was motivated by the need to better understand the contribution of different phytoplankton taxa to atmospheric trace gases. Many phytoplankton (and macroalgae) produce volatile organic compounds (e.g., DMS and isoprene) and other trace gases (e.g., carbonyl sulfide). Some of these gases have interesting functions in the atmosphere, such as the formation of secondary aerosols. Although there are many laboratory studies looking at trace gas production by phytoplankton in culture, environmental studies on this topic are usually limited in space and time by the duration of a single cruise or field campaign.

From Wilson et al., 2020. Temporal patterns for various trace gases. Note the strong seasonality for carbonyl sulfide, iodomethane, and chloromethane. Bromoform and dibromomethane are also seasonal, but exhibited a provocative spike in late 2014.

The temporal and spatial limitation of field campaigns is what makes long-term time-series efforts so valuable. For this study we had 9 years of overlapping data between SCCOOS and AGAGE. To analyze these data Jesse designed an approach based on Empirical Dynamic Modeling (EDM) and Convergent Cross Mapping (CCM). For good measure he also aggregated the available meteorological data using a self-organizing map (SOM). EDM and CCM are emerging techniques that can identify causal relationships between variables. The basic idea behind EDM is that a time-series can be described by its own time-lagged components. Given two time-series (say a trace gas and phytoplankton taxa), if the time-lagged components of one describe the other this is evidence of a causal relationship between the two. For a more in-depth treatment of EDM and CCM see this excellent tutorial on Hao Ye’s website.

Not surprisingly our all-vs-all approach to these datasets was a bit messy. A lot of this is due to the complexity of the natural environment and the spatial and temporal disconnect between the measurements. The phytoplankton counts are hyper-local, and reflect the very patchy nature of the marine environment, while the trace gas measurements are regional at best, as the atmosphere moves and mixes over great distances in only a few hours. Nonetheless we made the assumption that ecological observations at the pier are some reflection of conditions across a wider area, and that trace gas measurements do reflect some local influence. So there should be observable links between the two even if those links are muted.

From Wilson et al., 2020. Depth of color gives the value for rho, a measure of the cross-map ability when a parameter (row) affects a trace gas (column). Only significant values are shown, * indicates that the there is evidence of causal interaction in both directions, which typically indicates interaction with a third, unmeasured variable.

I’m particularly excited about what we can do with these data in the future, when we have several years of molecular data. As part of the Scripps Ecological Observatory initiative we’re sequencing 16S and 18S rRNA genes from the same samples as the SCCOOS microscopy counts. We have around 2.5 years of data so far. Just a few more years until we have a molecular dataset as extensive as the count data analyzed here! The key difference will be in the breadth of that data, which will allow us to identify an order of magnitude more phytoplankton taxa than are counted.

Posted in Research | Leave a comment

Finding those lost data files

Posted on September 16, 2020 by Jeff

It’s been a long time since I’ve had the bandwidth to write up a code snippet here. This morning I had not quite enough time between Zoom meetings to tackle something more involved, so here goes!

In this case I needed to find ~200 sequence (fasta) files for a student in my lab. They were split across several sequencing runs, and for various logistical reasons it was getting a bit tedious to find the location of each sequence file. To solve the problem I wrote a short Python script to wrap the Linux locate command and copy all the files to a new directory where they could be exported.

First, I created a text file “files2find.txt” with text uniquely matching each file that I needed to find. One of the great things about locate is that it doesn’t need to match the full file name.

head files2find.txt
151117_PAL_Sterivex_1
151126_PAL_Sterivex_2
151202_PAL_Sterivex_3
151213_PAL_Sterivex_4
151225_PAL_Sterivex_5
151230_PAL_Sterivex_6
160106_PAL_Sterivex_7
160118_PAL_Sterivex_9
160120_PAL_Sterivex_10
160128_PAL_Sterivex_11

Then the wrapper:

import subprocess
import shutil

with open('files2find.txt') as file_in:
    for line in file_in:
        line = line.rstrip()

        ## Here we use the subprocess module to run the locate command, capturing
        ## standard out.

        temp = subprocess.Popen('locate ' + line,
                                shell = True,
                                executable = '/bin/bash',
                                stdout = subprocess.PIPE)

        ## The communicate method for object temp returns a tuple.  First object
        ## in the tuple is standard out.       
        
        locations = temp.communicate()[0]
        locations = locations.decode().split('\n')

        ## Thank you internet for this one-liner, Python one-liners always throw
        ## me for a loop (no pun intended). Here we search all items in the locations
        ## list for a specific suffix that identifies files that we actually want.
        ## In this case our final analysis files contain "exp.fasta".  Of course if
        ## you're certain of the full file name you could just use locate on that and
        ## omit this step.

        fastas = [i for i in locations if 'exp.fasta' in i] 
        
        path = '/path/to/where/you/want/files/'
        
        found = set()

        ## Use the shutil library to copy found files to a new directory "path".
        ## Copied files are added to the set "found" to avoid being copied more than
        ## once, if they exist in multiple locations on your computer.
        
        for fasta in fastas:
            file_name = fasta.split('/')[-1]
            if file_name not in found:
                shutil.copyfile(fasta, path + file_name) 
                found.add(file_name)

        ## In the event that no files are found report that here.
                
        if len(fastas) == 0:
            print(line, 'not found')

Posted in Computer tutorials | Leave a comment

A dispatch from MOSAiC Leg 4 #2

Posted on August 11, 2020 by Emelia Chamberlain

PhD student Emelia Chamberlain sends the following dispatch from Polarstern.

The MOSAiC floe, just prior to break-up.

After 64 days at sea, the RV Polarstern and icy surroundings have officially started to feel like home. I can’t believe how quickly the time has passed, but here we are at the end of MOSAiC Leg 4! A truly special cruise, we witnessed the re-building and complete break-down of an ice camp, the peak and end of the spring under-ice bloom, and were the last to sample from the original MOSAiC floe before that singular, well characterized piece of ice (chosen all the way back in last October!) finally reached the marginal ice zone, and the end of its life as a contiguous floe. It’s really quite incredible to have had the opportunity to contribute to this astounding time series. As part of this expedition, over 200 individual scientists have worked on this one piece of ice, following its drift across the Arctic. Even if we cannot determine the full scope of this project or see all the results just now – it is clear that these data will have an impact for generations to come and, especially as a student, I feel so lucky to have contributed to this legacy.

*Polarstern* on a sunny day in the Arctic

Our home-base, the *RV Polarstern* is at the heart of this expedition. Summer weather near the sea-ice edge is extremely variable. Our evening weather reports predicted >95% cloud cover almost every day, but we still lucked out with some gorgeous sunny days. But no matter the weather (or distance from which it finally appears out of the fog), the *Polarstern* after a long day on the ice is always a welcome sight.

When we first arrived at the MOSAiC floe, we were very excited to find enough of it intact to re-establish the Central Observatory ice camp. Very soon, versions 2.0 of all the ice “cities” began popping up across the floe and the science began in earnest. However, every day we had new reminders of the fact that we were decidedly in the Arctic melt season. Melt-ponds became a dominant feature and often, new roads and pathways had to be forged on the fly to get to sampling locations. For example, starting at around 1.5 m the first week, our last first year ice cores were only 90 cm long.

Looking down from the bridge, this was the logistics area of the ice floe on June 19^th, the day after our arrival. The research camp has yet to be set up and the area is still fairly dry and melt-pond free.

Here is the same view from the bridge on Jul 25^th. The logistics area has become ponded and brownish due to sediment melting out of the ice, darkening the surface and enhancing melt. It was not uncommon to find rocks, shells, etc. on our floe – remnants from its origins off the coast of the New Siberian Islands (https://doi.org/10.5194/tc-14-2173-2020). In the foreground is the remote sensing site, followed by the Balloon Town (measuring atmospheric profiles). MET City and Ocean City are behind, hidden in the fog.

And now, the same view on Jul 31^st, after we hit the marginal ice zone and swell increased to the point of breaking the floe apart.

These melt-dynamics not only provided physical challenges to working on the ice, but also scientific ones as well. How to capture this fresh-water lens and study the impacts of such surface stratification on the biomass blooming beneath the ice? This stratification was seen most clearly in the lead systems surrounding the ship. After several surveys, we were able to characterize 3 clear layers – surface freshwater, a green algal layer (brackish salinities), and the underlying seawater. Over time, the living layer shoaled and went from a happy photosynthetic green, to a clearly dying, particulate organic matter greenish/brown. Capturing this bloom transition was quite exciting for us and I look forward to analyzing how the microbial community in these layers evolved.

Ale taking go pro footage from a lead near ROV Oasis on July 10^th. At this point, the biomass layer was around 1m deep and only visible with go-pro on stick or ROV footage.

Using the classical “tubing duct-taped to stick” technique, here I am sampling from a lead on the far-side of the floe on July 22^nd. At this point, the biomass layer is clearly visible near the surface, reaching a max of 30 cm deep, with lots of dead Melosira (ice-algae) floating in the current.

22^nd July cont. On the other end of the tubing, Ale fills our sample bottles while the OCEAN and ICE teams continue to survey the lead both from shore (upriser turbulence profiles – Team Ocean) and sea (high resolution CTD profiles by Kayak – Team ICE).

In addition to these opportunistic events, Alessandra D’Angelo (PhD, URI) and I were happy to continue progress on the core MethOx project work, started by Jessie Creamean (PhD, CSU) on Leg 1, and Jeff on Leg 3. When conditions allowed, we were lucky enough to have nearly daily CTD casts from the Polarstern rosette (thank you Team Ocean!) and were therefore never in want of water. With both of us on board we were able to maximize sampling and analysis, collecting almost 278 unique samples for the core project work alone! We measured weekly seawater profiles for microbial community structure in conjunction with ambient methane concentrations/isotopes and ran experimental samples to study potential oxidation/production rates of methane using elevated methane in select incubations.

The MethOx team after a successful deployment of the BGC team’s gas-flux chamber at Remote Sensing.

Filling BOD bottles from the CTD for discrete O₂/Ar ratios to run on the AWI MIMS.

Some of the most promising incubation samples, however, came from the bottom sections of sea-ice cores. In addition to our weekly water column work, I also took part in the interdisciplinary MOSAiC ice-coring Monday to support the sampling effort of these cores. This event included the collection of approximately 25 cores per site (First Year and Second Year ice) per week for a host of parameters: salinity, net primary production, gypsum content, etc. It was great to be able to start every week out on the ice and witness the changes occurring at our floe first-hand.

A heavily ponded First Year Ice Coring Site and the bridge we had to build (and re-build… and re-build) to get there.

Alison Fong (PhD, AWI) and I preparing to section cores. The tent is used to keep temperatures cool and protect samples from direct sunlight during processing. With air-temperatures hovering between -1 and 1 C for most of the leg, this was key to prevent premature melting of the samples.

Ale “fishing” for methane from a hole used for sediment trap deployments near the First Year Ice site. The syringe is used to carefully sample seawater into gas tight bags without incorporating air bubbles – that way we measure the true seawater signal without any atmospheric interference.

BUT – it’s not over yet. Even as I now take this time to reflect on Leg 4, we are quite busy with preparations for Leg 5 where we will head north and witness the re-freeze of the Arctic fall! And although it will be bittersweet to part from our Leg 4 colleagues… the Akademik Tryoshnikov has arrived and the handover must begin. I look forward to continuing on as the Bowman Lab/MethOx Project representative on MOSAiC Leg 5 and can’t wait to see where the RV Polarstern takes us next!

Another bittersweet farewell, the last weather balloon at the MOSAiC floe site on July 31, after the final break-up of the MOSAiC floe.

Happy scientists on the ice… even at 1:00 in the morning! This was the midpoint of a 24 hr sampling cycle and I think summarizes the energy brought by the Leg 4 team very well. Pictured left to right: Ale, UiT Post-doc Jessie Gardner, myself, and our fearless bear-guard Tereza Svecova.

Posted in MOSAiC | Leave a comment

MOSAiC Interview on The Not Old-Better Show

Posted on July 7, 2020 by Jeff

I was fortunate to have the chance to talk about some of our experiences on MOSAiC Leg 3 with Paul Vogelzang, host of the Smithsonian’s Not Old-Better podcast. The full interview can be found here:

Posted in MOSAiC | Leave a comment

A dispatch from MOSAiC Leg 4

Posted on July 4, 2020 by Emelia Chamberlain

PhD student Emelia Chamberlain sends the following dispatch from Polarstern.

Operating an international expedition in the remote central Arctic would always be a logistically taxing endeavor. Operating an international expedition in the remote central Arctic during a global pandemic is that much more challenging. But through the incredible perseverance of a delayed Leg 3 team and hard work from the dedicated logistical teams at the Alfred Wegner Institute in Germany, MOSAiC Leg 4 is underway!

Our NSF funded project work on MOSAiC will be carried out on this leg by URI Post-doc Alessandra D’Angelo and myself. In a reshuffling of plan operations, I found myself headed North almost a month earlier than expected (and for her almost a month later). On May 1, 2020, approx. one hundred scientists and crew began two weeks of quarantine in a local hotel in Bremerhaven. For the first week we were in total isolation within individual hotel rooms. Meals were brought to our door by the incredible hotel staff. Upon beginning our individualized quarantine, we took two tests, the first test on our arrival and the second one seven days later. We were all to remain in individualized quarantine until the results of our second coronavirus test came back. Thankfully, everyone tested negative and after seven long days we entered Phase 2 – group quarantine. Even weeks and 3,000 km away from meeting with the Polarstern, MOSAiC Leg 4 had finally begun. Tentatively at first, we emerged from our rooms to gather for meals, planning, and perhaps most importantly, safety briefings. While we were able to be around each other, we still practiced strict social distancing precautions.

Here I play the hypothermia victim during an Arctic Safety training. During melt season, falling into the freezing Arctic waters poses one of the greatest dangers while working out on the ice.

On this leg, I will serve as a member of the BioGeoChemistry team – we are principally interested in studying how climate relevant gasses cycle through the central Arctic’s ocean-ice-atmosphere system. Pictured here is myself, URI post-doc and Leg 4 BGC team lead Alessandra D’Angelo, and U. Gronigen post-doc Deborah Bozzato. The fourth and final member of our team, Falk Pätzold, was already onboard Polarstern having also participated in Leg 3.

After another 10 days, and negative test results for all, we began our journey Northward on the R/V Maria S. Merian and R/V Sonne to meet with the MOSAiC workhorse, R/V Polarstern. While overall lovely research vessels, the MS Merian and Sonne are not icebreakers, meaning the Polarstern would need to travel south of the ice edge to refuel and make the personnel exchange. Therefore, all ships coordinated to meet in Adventfjorden, Svalbard. It required a lot of time, but it was a beautiful journey. Due to some unexpected ice-pressure preventing southerly travel we ended up reaching the designated meeting point (near Longyearbyen, Norway) almost 2 weeks earlier than Polarstern! I spent most of the time planning, catching up on some coding, and working out in the many group sporting activities being held on board, all in prep for the labor-intensive ice activities. (Zumba and yoga are of course perfect analogs for pulling sledges of equipment across slushy sea ice…)

“C is for container!” The container behind this massive pile of luggage was my home for most of my time on the Merian. Between the two ships there were not enough cabins for the entire Leg 4 team so a few of us were tucked into these cozy make-shift apartments. To commemorate the experience, we took a “container crew” photo during move out.

Once the Polarstern finally arrived in Adventfjorden, a flurry of handover activities commenced. We met with the Leg 3 team, explored the labs, reviewed protocols, and heard all about their experience on the ice. Prior to departing the MOSAiC ice floe, there was a dynamic shift in the ice leading to some sampling sites splitting from the rest. This, however, is not surprising given its fast trajectory south and the onset of the summer melt-season. The ice drift (which can be tracked here in real time) has brought the MOSAiC floe to approximately 82º latitude, air temperatures mostly remain above freezing, and surface water is currently measured at -1.7ºC (warm for under-ice). With polar summer in full swing and such exciting ice dynamics, I look forward to getting back to the floe and tracking the ecological changes through this transition! But first, we must get there. I’m hoping to utilize this next phase of transit to explore the R/V Polarstern. Since it will be my new home for the next ~4.5 months, I suppose I should learn how to find my way from the lab to the mess hall without getting lost…

Before the ships pulled side to side and deployed the gangplank, people were ferried from ship to ship by small boats in order to make the most out of the few days we had for knowledge tranfer and handover activities.

While checking out our lab onboard Polarstern, I ran into Igor, the lab’s totem. Thus far he has done a pretty good job of keeping the instruments in check and running, but I hope that he will be pleased by my bringing him a friend.

Leg 4 waves goodbye from a departing Polarstern to those Leg 3 participants traveling home onboard the Merian. Bye Svalbard, to the North!

Posted in MOSAiC | Leave a comment

Tutorial: SuperSOMS and an R script for detecting regions of interest

Posted on May 16, 2020 by Jeff

A common exercise in environmental microbiology is counting bacterial cells with an epifluorescent microscope. During my PhD I spend many hours hunched over a microscope in a darkened room, contemplating which points of light were bacteria (and should thus be counted) and which were not. With a large cup of coffee from Seattle’s U District Bean and Bagel and an episode of “This American Life” playing in the background it’s not a bad way to spend the day. But it definitely felt like a procedure that needed some technological enhancement. My biggest concerns were always objectivity and reproducibility; it’s really hard to determine consistently which “regions of interest” (ROIs) to count. There are of course commercial software packages for identifying and counting ROIs in a microscope image. But why pay big money for a software subscription when you can write your own script? I had some free time during our slow transit to Polarstern during MOSAiC Leg 3 and thought I’d give it a try. The following tutorial borrows heavily from the image segmentation example in Wherens and Kruisselbrink, 2018.

We start with a png image from a camera attached to a microscope. The green features are bacteria and phytoplankton that have been stained with Sybr Green. These are the ROIs that we’d like to identify and count. The image quality is actually pretty bad here; this was made with the epifluorescent scope at Palmer Station back in 2015, and the scope and camera needed some TLC. It turns out that I don’t actually have many such images on my laptop, and I can’t simply go and make a new one because we aren’t allowed in the lab right now! Nonetheless the quality is sufficient for our purposes.

First, we need to convert the image into an RGB matrix that we can work with. I’m sure there’s a way to do this in R, but I couldn’t find an expedient method. Python made it easy.

## convert image to two matrices: a 3 column RGB matrix and
## 2 column xy matrix

import matplotlib.image as mpimg

name = '15170245.png'
name = name.rstrip('.png')

img = mpimg.imread(name + '.png')

with open(name + '.rgb4r.csv', 'w') as rgb_out, open(name + '.xy.csv', 'w') as xy_out:
    for i in range(0, img.shape[1]):
        for j in range(0, img.shape[0]):
            print(img[j, i, 0], img[j, i, 1], img[j, i, 2], sep = ',', file = rgb_out)
            print(i + 1, j + 1, sep = ',', file = xy_out)

Next we break out R to do the actual analysis (which yes, could be done in Python…). The basic strategy is to use a self organizing map (SOM) with 2 layers. One layer will be color, the other will be position. We’ll use this information to identify distinct classes corresponding to diagnostic features of ROIs. Last, we’ll iterate across all the pixels that appear to belong to ROIs and attempt to draw an ellipse around each group of pixels that makes up a ROI. First, we read in the data produced by the Python script:

scope.rgb <- read.csv('15170245.rgb4r.csv', header = F)
scope.xy <- read.csv('15170245.xy.csv', header = F)

colnames(scope.rgb) <- c('red', 'green', 'blue')
colnames(scope.xy) <- c('X', 'Y')

Then we define a function to render the image described by these matrices:

plotImage <- function(scope.xy, scope.rgb){
  image.col <- rgb(scope.rgb[,"red"], scope.rgb[,"green"], scope.rgb[,"blue"], maxColorValue = 1)
  x.dim <- max(scope.xy$X)
  y.dim <- max(scope.xy$Y)
  
  temp.image <- 1:dim(scope.rgb)[1]
  dim(temp.image) <- c(y.dim, x.dim)
  image(0:x.dim,
        0:y.dim,
        t(temp.image),
        col = image.col,
        ylab = paste0('Y:', y.dim),
        xlab = paste0('X:', x.dim))
}

## give it a test

plotImage(scope.xy, scope.rgb)

You'll note that the function flips the image. While annoying, this doesn't matter at all for identifying ROIs. If it bothers you go ahead and tweak the function :). Now we need to train our SOM. The SOM is what does the heavy lifting of identifying different types of pixels in the image.

#### train som ####

som.grid <- somgrid(10, 10, 'hexagonal')
som.model <- supersom(list('rgb' = data.matrix(scope.rgb), 'coords' = data.matrix(scope.xy)), whatmap = c('rgb', 'coords'), user.weights = c(1, 9), grid = som.grid)

Now partition the SOM into k classes with k-means clustering. The value for k has to be determined experimentally but should be consistent for all the images in a set (i.e. a given type of microscopy image).

som.codes <- som.model$codes[[1]]
som.codes.k <- kmeans(som.codes, centers = 6)

## Get mean colors for clusters (classes)

class.col <- c()

for(k in 1:max(som.codes.k$cluster)){
  temp.codes <- som.codes[which(som.codes.k$cluster == k),]
  temp.col <- colMeans(temp.codes)
  temp.col <- rgb(temp.col[1], temp.col[2], temp.col[3], maxColorValue = 1)
  class.col <- append(class.col, temp.col)
}

## Make a plot of the som with the map units colored according to mean color
## of owning class.

plot(som.model,
     type = 'mapping',
     bg = class.col[som.codes.k$cluster],
     keepMargins = F,
     col = NA)

text(som.model$grid$pts, labels = som.codes.k$cluster)
add.cluster.boundaries(som.model, som.codes.k$cluster)

Here's where we have to be a bit subjective. We need to make an informed decision about which classes constitute ROIs. Looking at this map I'm gonna guess 3 and 6. The classes and structure of your map will of course be totally different, even if you start with the same training image. To make use of this information we first predict which classes our original pixels belong to.

## predict for RGB only

image.predict <- predict(som.model, newdata = list('rgb' = data.matrix(scope.rgb)), whatmap = 'rgb')

Then we identify those pixels that below to the classes we think describe ROIs.

## select units that correspond to ROIs

target.units = which(som.codes.k$cluster %in% c(3, 6))
target.pixels <- scope.xy[which(image.predict$unit.classif %in% target.units), c('X', 'Y')]

Now comes the tricky bit. Knowing which pixels belong to ROIs isn't actually that useful, as each ROI is composed of many different pixels. So we need to aggregate the pixels into ROIs. Again, this requires a little experimentation, but once you figure it out for a given sample type it should work consistently. The key parameter here is "resolution" which we define as how far apart two pixels of the same class need to be to constitute different ROIs. The value 20 seems to work reasonably well for this image.

## loop through all pixels.  if there's a pixel within n distance of it, check to
## see if that pixel belongs to an ROI.  If so, add the new pixel to that area.  If not,
## create a new ROI.  Currently a pixel can be "grabbed" by an ROI produced later.

findROI <- function(resolution = 20){
  roi <- 1
  roi.pixels <- as.data.frame(matrix(nrow = dim(target.pixels)[1], ncol = 3))
  colnames(aoi.pixels) <- c('X', 'Y', 'roi')
  
  for(i in 1:dim(target.pixels)[1]){
    
    if(is.na(roi.pixels[i, 'roi']) == T){
      pixel.x <- target.pixels[i, 'X']
      pixel.y <- target.pixels[i, 'Y']
      nns <- which(abs(target.pixels[, 'X'] - pixel.x) < resolution & abs(target.pixels[, 'Y'] - pixel.y) < resolution)
      
      roi.pixels[nns, c('X', 'Y')] <- target.pixels[nns, c('X', 'Y')]
      roi.pixels[nns, 'roi'] <- roi
      roi <- roi + 1
    }
  }
  return(roi.pixels)
}
  
roi.pixels <- findROI()
roi.table <- table(roi.pixels$roi)

To evaluate our discovery of ROIs we plot an ellipse around each ROI in the original image.

## approximate each roi as an ellipse.  need x, y, a, b

plotROI <- function(roi.pixels){
  require(plotrix)
  
  for(roi in unique(roi.pixels$roi)){
    temp.pixels <- roi.pixels[which(roi.pixels$roi == roi),]
    temp.a <- max(temp.pixels$X) - min(temp.pixels$X)
    temp.b <- max(temp.pixels$Y) - min(temp.pixels$Y)
    temp.x <- mean(temp.pixels$X)
    temp.y <- mean(temp.pixels$Y)
    
    plot.y <- temp.y
    draw.ellipse(temp.x, plot.y, temp.a, temp.b, border = 'red')
  }
}

plotImage(scope.xy, scope.rgb)
plotROI(roi.pixels)

It certainly isn't perfect, the two chained diatoms in particular throw off our count. We did, however, do a reasonable job of finding all the small ROIs that represent the smaller, harder to count cells. So how does the model perform for ROI identification on a new image? Here's a new image acquired with the same exposure settings on the same scope. We use the same Python code to convert it to RGB and XY matrices.

## convert image to two matrices: a 3 column RGB matrix and
## 2 column xy matrix

import matplotlib.image as mpimg

name = '14000740.png'
name = name.rstrip('.png')

img = mpimg.imread(name + '.png')

with open(name + '.rgb4r.csv', 'w') as rgb_out, open(name + '.xy.csv', 'w') as xy_out:
    for i in range(0, img.shape[1]):
        for j in range(0, img.shape[0]):
            print(img[j, i, 0], img[j, i, 1], img[j, i, 2], sep = ',', file = rgb_out)
            print(i + 1, j + 1, sep = ',', file = xy_out)

Then we predict and plot.

scope.rgb <- read.csv('14000740.rgb4r.csv', header = F)
scope.xy <- read.csv('14000740.xy.csv', header = F)

colnames(scope.rgb) <- c('red', 'green', 'blue')
colnames(scope.xy) <- c('X', 'Y')

plotImage(scope.xy, scope.rgb)
image.predict <- predict(som.model, newdata = list('rgb' = data.matrix(scope.rgb)), whatmap = 'rgb') # predict for rgb only

target.units = which(som.codes.k$cluster %in% c(3,6))
target.pixels <- scope.xy[which(image.predict$unit.classif %in% target.units), c('X', 'Y')]

roi.pixels <- findROI()
roi.table <- table(roi.pixels$roi)

plotImage(scope.xy, scope.rgb)
plotROI(roi.pixels)

Not bad! Again, it isn't perfect, some ROIs are grouped together and some are missed (largely a function of the variable focal plane). These can be fixed by experimenting with the model parameters and resolution. We did accomplish the goal of improving objectivity and reproducibility; our approach isn't always right, but at least it's wrong in a consistent way! Of course an additional advantage is if you had hundreds of such images, perhaps representing multiple randomly selected fields from many images, you could process in a few minutes what would take many hours to count.

At this point I can feel the collective judgement of every environmental microbiologist since van Leeuwenhoek for promoting a method that might reduce the serendipitous discovery that comes with spending hours staring through a microscope. So here's a reminder to spend time getting familiar with your samples under the microscope, regardless of how you identify ROIs!

Posted in Computer tutorials | Leave a comment

Tutorial: Self Organizing Maps in R

Posted on May 10, 2020 by Jeff

Self-organizing maps (SOMs) are a form of neural network and a wonderful way to partition complex data. In our lab they’re a routine part of our flow cytometry and sequence analysis workflows, but we use them for all kinds of environmental data (like this). All of the mainstream data analysis languages (R, Python, Matlab) have packages for training and working with SOMs. My favorite is the R package Kohonen, which is simple to use but can support some fairly complex analysis through SOMs with multiple data layers and supervised learning (superSOMs). The Kohonen package has a nice, very accessible paper that describe its key features, and some excellent examples. This tutorial applies our basic workflow for a single-layer SOM to RGB color data. RGB color space segmentation is a popular way to evaluate machine learning algorithms, as it is intrinsically multi-variate and inherently meaningful. Get like colors grouping together and you know that you’ve set things up correctly!

This application of SOMs has two steps. Each of these steps can be thought of as an independent data reduction step. It’s important to remember that you’re not reducing dimensions per se, as you would in a PCA, you’re aggregating like data so that you can describe them as convenient units (instead of n individual observations). The final outcome, however, represents a reduction in dimensionality to a single parameter for all observations (e.g., the color blue instead of (0, 0, 255) in RGB colorspace). The first step – training the SOM – assigns your observations to map units. The second step – clustering the map units into classes – finds the structure underlying the values associated with the map units after training. At the end of this procedure each observation belongs to a map unit, and each map unit belongs to a class. Thus each observation inherits the class of its associated map unit. If that’s not clear don’t sweat it. It will become clear as you go through the procedure.

First, let’s generate some random RGB data. This takes the form of a three column matrix where each row is a pixel (i.e. an observation).

#### generate some RGB data ####

## select the number of random RGB vectors for training data

sample.size <- 10000

## generate dataframe of random RGB vectors

sample.rgb <- as.data.frame(matrix(nrow = sample.size, ncol = 3))
colnames(sample.rgb) <- c('R', 'G', 'B')

sample.rgb$R <- sample(0:255, sample.size, replace = T)
sample.rgb$G <- sample(0:255, sample.size, replace = T)
sample.rgb$B <- sample(0:255, sample.size, replace = T)

Next, we define a map space for the SOM and train the model. Picking the right grid size for the map space is non-trivial; you want about 5 elements from the training data per map unit, though you’ll likely find that they’re not uniformly distributed. It’s best to use a symmetrical map unless you have a very small training dataset, hexagonal map units, and a toroidal shape. The latter is important to avoid edge effects (a toroid has no edges).

One important caveat for the RGB data is that we’re not going to bother with any scaling or normalizing. The parameters are all on the same scale and evenly distributed between 0 and the max value of 255. Likely your data are not so nicely formed!

#### train the SOM ####

## define a grid for the SOM and train

library(kohonen)

grid.size <- ceiling(sample.size ^ (1/2.5))
som.grid <- somgrid(xdim = grid.size, ydim = grid.size, topo = 'hexagonal', toroidal = T)
som.model <- som(data.matrix(sample.rgb), grid = som.grid)

One you’ve trained the SOM it’s a good idea to explore the output of the `som` function to get a feel for the different items in there. The output takes the form of nested lists. Here we extract a couple of items that we’ll need later, and also create a distance matrix of the map units. We can do this because the fundamental purpose of map units is to have a codebook vector that mimics the structure of the training data. During training each codebook vector is iteratively updated along with its neighbors to match the training data. After sufficient iterations the codebook vectors reflect the underlying structure of the data.

## extract some data to make it easier to use

som.events <- som.model$codes[[1]]
som.events.colors <- rgb(som.events[,1], som.events[,2], som.events[,3], maxColorValue = 255)
som.dist <- as.matrix(dist(som.events))

Now that we have a trained SOM let’s generate a descriptive plot. Since the data are RGB colors, if we color the plot accordingly it should be sensible. For comparison, we first create a plot with randomized codebook vectors. This represents the SOM at the start of training.

## generate a plot of the untrained data.  this isn't really the configuration at first iteration, but
## serves as an example

plot(som.model,
     type = 'mapping',
     bg = som.events.colors[sample.int(length(som.events.colors), size = length(som.events.colors))],
     keepMargins = F,
     col = NA,
     main = '')

And now the trained SOM:

## generate a plot after training.

plot(som.model,
     type = 'mapping',
     bg = som.events.colors,
     keepMargins = F,
     col = NA,
     main = '')

So pretty! The next step is to cluster the map units into classes. As with all clustering analysis, a key question is how many clusters (k) should we define? One way to inform our decision is to evaluate the distance between all items assigned to each cluster for many different values of k. Ideally, creating a scree plot of mean within-cluster distance vs. k will yield an inflection point that suggests a meaningful value of k. In practice this inflection point is extremely sensitive to the size of the underlying data (in this case, the number of map units), however, it can be a useful starting place. Consider that the RGB data were defined continuously, meaning that there is no underlying structure! Nonetheless we still get an inflection point.

#### look for a reasonable number of clusters ####

## Evaluate within cluster distances for different values of k.  This is
## more dependent on the number of map units in the SOM than the structure
## of the underlying data, but until we have a better way...

## Define a function to calculate mean distance within each cluster.  This
## is roughly analogous to the within clusters ss approach

clusterMeanDist <- function(clusters){
  cluster.means = c()
  
  for(c in unique(clusters)){
    temp.members <- which(clusters == c)
    
    if(length(temp.members) > 1){
      temp.dist <- som.dist[temp.members,]
      temp.dist <- temp.dist[,temp.members]
      cluster.means <- append(cluster.means, mean(temp.dist))
    }else(cluster.means <- 0)
  }
  
  return(mean(cluster.means))
  
}

try.k <- 2:100
cluster.dist.eval <- as.data.frame(matrix(ncol = 3, nrow = (length(try.k))))
colnames(cluster.dist.eval) <- c('k', 'kmeans', 'hclust')

for(i in 1:length(try.k)) {
  cluster.dist.eval[i, 'k'] <- try.k[i]
  cluster.dist.eval[i, 'kmeans'] <- clusterMeanDist(kmeans(som.events, centers = try.k[i], iter.max = 20)$cluster)
  cluster.dist.eval[i, 'hclust'] <- clusterMeanDist(cutree(hclust(vegdist(som.events)), k = try.k[i]))
}

plot(cluster.dist.eval[, 'kmeans'] ~ try.k,
     type = 'l')

lines(cluster.dist.eval[, 'hclust'] ~ try.k,
      col = 'red')

legend('topright',
       legend = c('k-means', 'hierarchical'),
       col = c('black', 'red'),
       lty = c(1, 1))

Having picked a reasonable value for k (let’s say k = 20) we can evaluate different clustering algorithms. For our data k-means almost always performs best, but you should choose what works best for your data. Here will evaluate k-means, hierarchical clustering, and model-based clustering. What we’re looking for in the plots is a clustering method that produces contiguous classes. If classes are spread all across the map, then the clustering algorithm isn’t capturing the structure of the SOM well.

#### evaluate clustering algorithms ####

## Having selected a reasonable value for k, evaluate different clustering algorithms.

library(pmclust)

## Define a function for make a simple plot of clustering output.
## This is the same as previousl plotting, but we define the function
## here as we wanted to play with the color earlier.

plotSOM <- function(clusters){
  plot(som.model,
       type = 'mapping',
       bg = som.events.colors,
       keepMargins = F,
       col = NA)
  
  add.cluster.boundaries(som.model, clusters)
}

## Try several different clustering algorithms, and, if desired, different values for k

cluster.tries <- list()

for(k in c(20)){
  
  ## model based clustering using pmclust
  
  som.cluster.pm.em <- pmclust(som.events, K = k, algorithm = 'em')$class # model based
  som.cluster.pm.aecm <- pmclust(som.events, K = k, algorithm = 'aecm')$class # model based
  som.cluster.pm.apecm <- pmclust(som.events, K = k, algorithm = 'apecm')$class # model based
  som.cluster.pm.apecma <- pmclust(som.events, K = k, algorithm = 'apecma')$class # model based
  som.cluster.pm.kmeans <- pmclust(som.events, K = k, algorithm = 'kmeans')$class # model based
  
  ## k-means clustering
  
  som.cluster.k <- kmeans(som.events, centers = k, iter.max = 100, nstart = 10)$cluster # k-means
  
  ## hierarchical clustering
  
  som.dist <- dist(som.events) # hierarchical, step 1
  som.cluster.h <- cutree(hclust(som.dist), k = k) # hierarchical, step 2
  
  ## capture outputs
  
  cluster.tries[[paste0('som.cluster.pm.em.', k)]] <- som.cluster.pm.em
  cluster.tries[[paste0('som.cluster.pm.aecm.', k)]] <- som.cluster.pm.aecm
  cluster.tries[[paste0('som.cluster.pm.apecm.', k)]] <- som.cluster.pm.apecm
  cluster.tries[[paste0('som.cluster.pm.apecma.', k)]] <- som.cluster.pm.apecma
  cluster.tries[[paste0('som.cluster.pm.kmeans.', k)]] <- som.cluster.pm.kmeans
  cluster.tries[[paste0('som.cluster.k.', k)]] <- som.cluster.k
  cluster.tries[[paste0('som.cluster.h.', k)]] <- som.cluster.h
}

## Take a look at the various clusters.  You're looking for the algorithm that produces the
## least fragmented clusters.

plotSOM(cluster.tries$som.cluster.pm.em.20)
plotSOM(cluster.tries$som.cluster.pm.aecm.20)
plotSOM(cluster.tries$som.cluster.pm.apecm.20)
plotSOM(cluster.tries$som.cluster.pm.apecma.20)
plotSOM(cluster.tries$som.cluster.k.20)
plotSOM(cluster.tries$som.cluster.h.20)

For brevity I’m not showing the plots produced for all the different clustering algorithms. For these data the k-means and hierarchical clustering algorithms both look pretty good, I have a slight preference for the k-means version:

The SOM and final map unit clustering represent a classification model that can be saved for use with later data. Once huge advantage to using SOMs over other analysis methods (e.g., ordination techniques) is their usefulness for organizing newly collected data. New data, if necessary scared and normalized in the same way as the training data, can be classified by finding the map unit with the minimum distance to the new observation. To demonstrate this, we’ll generate and classify a small new RGB dataset (in reality classifying in this way is very efficient, and could accommodate a huge number of new observations). First, we save the SOM and final clustering.

## The SOM and map unit clustering represent a classification model.  These can be saved for
## later use.

som.cluster <- som.cluster.k
som.notes <- c('Clustering based on k-means')

save(file = 'som_model_demo.Rdata', list = c('som.cluster', 'som.notes', 'som.model', 'som.events.colors'))

Then we generate new RGB data, classify it, and make a plot to compare the original data, the color of the winning map unit, and the color of the cluster that map unit belongs to.

#### classification ####

## make a new dataset to classify ##

new.data.size <- 20
new.data <- as.data.frame(matrix(nrow = new.data.size, ncol = 3))
colnames(new.data) <- c('R', 'G', 'B')

new.data$R <- sample(0:255, new.data.size, replace = T)
new.data$G <- sample(0:255, new.data.size, replace = T)
new.data$B <- sample(0:255, new.data.size, replace = T)

## get the closest map unit to each point

new.data.units <- map(som.model, newdata = data.matrix(new.data))

## get the classification for closest map units

new.data.classes <- som.cluster[new.data.units$unit.classif]

## compare colors of the new data, unit, and class, first define a function
## to calculate the mean colors for each cluster

clusterMeanColor <- function(clusters){
  cluster.means = c()
  som.codes <- som.model$codes[[1]]
  
  for(c in sort(unique(clusters))){
    temp.members <- which(clusters == c)
    
    if(length(temp.members) > 1){
      temp.codes <- som.codes[temp.members,]
      temp.means <- colMeans(temp.codes)
      temp.col <- rgb(temp.means[1], temp.means[2], temp.means[3], maxColorValue = 255)
      cluster.means <- append(cluster.means, temp.col)
    }else({
      temp.codes <- som.codes[temp.members,]
      temp.col <- rgb(temp.codes[1], temp.codes[2], temp.codes[3], maxColorValue = 255)
      cluster.means <- append(cluster.means, temp.col)
      })
  }
  
  return(cluster.means)
  
}

class.colors <- clusterMeanColor(som.cluster)

plot(1:length(new.data$R), rep(1, length(new.data$R)),
     col = rgb(new.data$R, new.data$G, new.data$B, maxColorValue = 255),
     ylim = c(0,4),
     pch = 19,
     cex = 3,
     xlab = 'New data',
     yaxt = 'n',
     ylab = 'Level')

axis(2, at = c(1, 2, 3), labels = c('New data', 'Unit', 'Class'))

points(1:length(new.data.units$unit.classif), rep(2, length(new.data.units$unit.classif)),
     col = som.events.colors[new.data.units$unit.classif],
     pch = 19,
     cex = 3)

points(1:length(new.data.classes), rep(3, length(new.data.classes)),
       col = class.colors[new.data.classes],
       pch = 19,
       cex = 3)

Looks pretty good! Despite defining only 20 classes, class seems to be a reasonable representation of the original data. Only slight differences in color can be observed between the data, winning map unit, and class.

Posted in Computer tutorials | 19 Comments

MOSAiC Leg 3

Posted on May 2, 2020 by Jeff

Where to begin… I started writing this post several days ago on a nearly empty plane flying from Charlotte, N.C. to San Diego. That flight marked the end of MOSAiC Leg 3 for me, though Leg 3 will continue for several more weeks. We were supposed to be done on April 4, however, the COVID-19 pandemic and dynamic sea ice conditions pretty well hashed that plan. I took advantage of a single opportunity to leave the Polarstern early (meaning only a couple of weeks late) to help with the childcare situation at home. The remaining brave and dedicated crew and scientists are expected to return by ship to Europe in late May.

Tromsø, Norway. High on my list of nice places.

Our journey began in Tromsø, Norway in the innocent days of January when COVID-19 seemed like a regional rather than global problem. We had several Chinese expedition members on Leg 3, but they all made it out just fine and – though they were concerned about the situation back home – we figured things would improve in due time. After a week of safety training we were transported to the Russian icebreaker Kapitan Dranitsyn for what we thought would be a 2-3 week voyage to the Polarstern.

Leg 3 aboard the *Kapitan Dranitsyn* in Tromsø on January 28.

A lot of thought and effort went into determining how the MOSAiC scientists and crew would reach Polarstern throughout the drift. During the winter a conventional icebreaker of Dranitsyn‘s class is not the best way to reach a location in the central Arctic, however, it was the best among the affordable and available options. And in the end it did the job. I’m not sure of the history, but I’m fairly certain that its feat of reaching Polarstern in the dead of winter is unprecedented.

The *Dranitsyn* plows through “mild” winter weather in the Barents Sea on February 5.

We spent a week in a pleasant fjord just outside of Tromsø waiting for a weather break to cross the Barents Sea to the pack ice. The Barents Sea is notoriously stormy, and we needed wave heights below 4 m to attempt the crossing. Ice breakers don’t ride well in heavy seas, and there were refrigerated shipping containers carrying (our) food for Polarstern stored on deck in the bow. We couldn’t take big waves on the bow, nor could we tolerate much ice formation on the chilling units. Both happened anyway. But at least the crossing was relatively short, and within 72 hours we were in the pack ice.

*Dranitsyn* on February 6, shortly after entering the pack ice.

The Dranitsyn ended up being a bigger part of Leg 3 than any of us imagined at the time. Had our departure from Polarstern gone as expected, we would have been in transit on Dranitsyn as long as we were doing science on Polarstern. It was an interesting experience. Scientists are pretty good at keeping themselves busy; most of us have a long backlog of analyses to complete and papers to write, in addition to fielding emails from students and colleagues, and handling administrative matters. Absent the internet or email, however, a lot of these responsibilities disappeared. I worked on a proposal and finished a (3 year overdue) paper. Much of the rest of the 5 weeks we were on Dranitsyn I spent in discussion with my Leg 3 colleagues. It was something like an extended MOSAiC workshop. We covered everything from synthesizing across different science themes, to how time and on-ice resources would be shared between groups in a typical week.

The *Dranitsyn* on February 23, deep in the Arctic ice.

The roughly 2 week delay in reaching the Polarstern was due in large part to ice conditions. Despite previously expeditions onboard icebreakers I hadn’t really thought out it this way before, but the ice compresses and relaxes in response to tides, winds, and other external forces. When the ice is compressed it’s incredibly difficult for an ice breaker to break; there’s simply no place for the ice to be displaced to. Under a relaxed state more leads are open, and there’s more space within the pack ice accommodate the displaced ice as the ship moves through. Temperature also plays a role. Icebreakers are more typically used during the spring summer and fall, when maritime shipping in the Arctic is active and more research activities are taking place. Spring, summer, and fall sea ice is naturally much warmer than winter sea ice. Warmer sea ice is softer and breaks more easily, and the surface of the ice has less friction. This means an icebreaker can more easily ride up and over the ice to break it. With temperatures low and the ice in a “compressed” state it was a tough grind, as this video from our embedded videographers from the UFA production company shows:

Despite the conditions and some weeks of uncertainty we did eventually make it to Polarstern. Seeing that little point of light appear on the horizon after all those weeks of travel made me appreciate how alone we were up there at that time of year.

*Dranitsyn* approaching *Polarstern* on February 28. It was another week before we finally said good-bye to the *Dranitsyn* and settled into our new home in the Arctic.

After we reached Polarstern it took nearly a week to transfer cargo, exchange the scientists and crew, and become familiar enough with our tasks and the surroundings to take over. Many of the MOSAiC observations take place in what’s called the central observatory. This is an aggregation of relatively stable floes around Polarstern that house a number of on-ice installations. These include the colloquially named Met City, Balloon Town, Ocean City, and Droneville sites, among others. Beyond the central observatory are various “pristine” sites for snow and ice sampling, and beyond those lie the nodes of the less frequently visited distributed observatory. The distributed observatory is critical because it provides some spatial context to the intensive observations of the central observatory.

Preparing for a CTD cast on March 6, shortly after the departure of *Dranitsyn*. In between CTD casts the hole is covered by the Weatherhaven, which has to be picked up and moved out of the way for each deployment. Note the location of the gangway relative to the CTD hole (it’s going to change!). Immediately behind the CTD hole is the logistics zone for staging equipment. In the background, just behind the green flag, you can make out the Ocean City and Balloon Town sites.

We had about a week of “normal” operations before the central Arctic started throwing plot twists at us. The ice was surprisingly dynamic. The reasons behind this will, I think, be an important science outcome of MOSAiC. Thinner, rougher sea ice? More wind stress? Whatever the cause the ice cracked a lot. By chance we were located in what ice dynamicists call a “shear zone”, an area of enhanced kinetic energy within the pack. Here’s the first emergence of a crack, on March 11. You can see the logistics team scrambling to move the snow machines to a safer location. Over the next few weeks this crack grew into a major and ever-evolving lead.

Leg 3’s first encounter with a major crack in the central observatory, on March 11.

By April everyone was pretty used to cracks, leads, and ridges in the central observatory. Overall I was impressed with the resilience of the various team as different installations were threatened, and in some cases destroyed. For the ATMOS (atmospheric sciences) team in particularly, Leg 3 should have been relatively easy – low temps not withstanding. Everyone expected consolidated ice and stable, well-established instruments and protocols. Instead, for a period of time there was a near-daily scramble to maintain power and infrastructure at the Met City site. The adjacent Remote Sensing site was enveloped by a large ridge system and had to be relocated to near the logistics area. Setting up these sites is something that took specialists on Leg 1 many days, on Leg 3 systems had to be dismantled and re-established on the fly. Because spring is a particularly interesting time for atmospheric chemistry in the Arctic the clock was ticking every time a site or instrument went down. The dedication and ingenuity of the scientists at the Met City and the Remote Sensing sites was great to observe. The rest of us helped where we could, but we had our hands pretty full with other problems.

*Polarstern* crew wrangle power cables that have become trapped between the ship and the floe on March 23. Maintaining the power supply from the ship to the various on-ice installations was a huge challenge given the dynamic ice conditions.

At the top of the problems list was the loss of the hole for the main CTD/rosette system. The CTD is an instrument package that measure conductivity, temperature, and pressure (depth) along with other parameters. It is the fundamental tool in ship-based oceanography. The CTD is attached to a long conductive wire, and embedded within a rosette of sample bottles. The sample bottles are fired at specific depths to collect water for a number of analyses and experiments. Lots of parameters and projects were dependent on this sampling system, and a huge amount of effort had been expended to construct and maintain a hole in the ice for deploying it. On March 15, however, the ice shifted, pushing Polarstern forward. This caused superficial damage to the Weatherhaven covering the hole, but more significantly placed the hole out of reach of the crane that operates the CTD/rosette. Just like that we lost all of our capacity to sample below 1000 m (the central Arctic Ocean is deeper than 4000 m in most places) and to collect large volumes of water from any depth. All sampling had to shift to a much smaller system at the Ocean City site.

On March 15 shifting ice moves the *Polarstern* forward, rendering the main CTD hole useless.

Why couldn’t we simply make a new hole? It’s worth remembering that the ice in March is near its maximum thickness. It was roughly 160 cm thick when access to the main CTD hole was lost. This discounts any rafting of multiple ice floes, which was probable, and could easily double or triple the thickness. Assuming only a single layer of ice, the way to make a hole big enough for the CTD is with a chainsaw. The thickness of the ice that you can cut is limited by the size of the chainsaw bar. Maybe commercial logging operations have a 2 m bar, we certainly did not! You can cut the ice out in layers – I’ve had to do this in the Antarctic before – but the problem is that you create a bathtub as soon as you start cutting the final layer. To finish the job you’d need a snorkel for yourself and the chainsaw!

All our water column sampling had to shift to Ocean City, and focus on the upper 1000 m of the water column. Ocean City is the main site for the physical oceanography team. It was designed to accommodate a small team taking ultra-high resolution measurements of the surface ocean. The physical oceanographers went above and beyond sharing their space and resources, and I ended up thoroughly enjoying the time that I spent out at Ocean City. The below video was made on April 20 by lowering a GoPro through the CTD hole at Ocean City while one of the physical oceanographers is conducting high resolution profiling of temperature and salinity. You can see the microstructure profiler used for this near the end of the sequence.

In addition to the water column sampling we carried out sea ice sampling, when conditions allowed. To minimize the impact of light pollution from the vessel on the growth of sea ice algae our preferred ice coring sites were located some distance from the ship. Through the spring and summer, most of the photosynthesis taking place in the central Arctic occurs in the ice itself, rather than the water column. The ice algae have more consistent access to light than their planktonic counterparts, and are famously sensitive to even the lowest levels of light. Ambient light from the ship is more than enough to induce growth in the vicinity during the long polar night. Distance from the ship combined with the dynamic ice conditions created some access challenges.

Delicate maneuvering with a full load of ice cores on April 6. Photo: Eric Brossier.

Despite the access challenges we got some great ice core samples. We fielded two ice coring teams, one for first year ice and one for second year ice. I had the pleasure of working with the second year ice coring team. It was a great US-German-Russian collaborative effort, and we had some good times out there!

Laura Wischnewski and I section sea ice cores. Photo: Eric Brossier.

The combined Leg 3 first year and second year ice coring teams.

The original plan for exchanging Leg 3 with Leg 4 involved flying us all out on ski equipped Antonov An-74 aircraft. This would have been a slick and expedient way to carry out the exchange. It also requires a pretty long runway and permissive global travel. By mid-March it was clear that both of these things were going to be an issue. I’ll be honest, there were some tense weeks where it wasn’t clear when and how Leg 3 would end, and what the future of MOSAiC would be. Kudos to cruise and expedition leadership for navigating us through the ups and downs. In particular AWI logistics had the difficult task of designing and scoping the possible solutions. They did an amazing job of iteratively working through a huge range of options to come up with the one that maximized science and minimized impacts on individual lives. But of course it was a compromise.

The current plan involves the Polarstern leaving the central observatory in mid-May for a rendezvous with one or more ships near Svalbard. The Leg 4 personnel (including Bowman Lab member Emelia Chamberlain) are already under strict quarantine in Bremerhaven, Germany. They’ll remain under quarantine until they depart for Svalbard at roughly the same time Polarstern leaves the central observatory. Once the crew has been exchanged, Leg 3 will sail for Germany and Leg 4 will begin the difficult task of re-establishing observations at the central observatory. An advantage of this plan is that it doesn’t require a complete breakdown of the central observatory. It will require, however, that many of the installations be partially disassembled for safety while Polarstern is away from the floe.

This is how you get a Twin Otter to the central Arctic.

There was one opportunity to leave Polarstern before the official Leg 3-4 exchange. After agonizing over it for a couple of days I decided I needed to take advantage of the opportunity. After an epic few weeks our project was in decent shape, and with two young kids and no school or daycare, attention needed to shift to the home front. On April 22 I stepped onto a Twin Otter operated by Kenn Borek Air Ltd. to begin the long flight home with six other expedition members. We flew to Station Nord in Greenland, then across the Canadian Arctic via Eureka-Resolute-Arctic Bay-Churchill-Toronto, and finally to the US.

Immigration: “You’re coming from where?”

Me: “Resolute”

Immigration: “What were you doing in Resolute”

Me: “Just passing through, we were only there for a few minutes”

Immigration: “So where were you before that?”

Me: “Greenland, but again not very long. See there’s this ship…”

Immigration: “Uh, nevermind. Here’s your passport.”

Spectacular view of the northern Greenland coastline on the approach to Station Nord. Note the obvious interface between the landfast sea ice and the drifting pack ice. This feature is part of the circumpolar flaw-lead system and extended as far as we could see in either direction.

Posted in MOSAiC | 1 Comment

Photos from MOSAiC Leg 1

Posted on January 9, 2020 by Jeff

Many thank to atmospheric chemist extraordinaire Jessie Creamean for participating in our NSF project on MOSAiC Leg 1. Jessie’s participation allowed us to have a physical presence during the critical setup phase and freeze in. Her participation was a double win; she has her own DOE funded project with MOSAiC that didn’t include ship time, while our project needed a capable hand for Leg 1. I’ll be picking up where she left off when Leg 3 starts in just a couple of weeks. Jessie shared a few photos from Leg 1.