PhD student Emelia Chamberlain sends the following dispatch from Polarstern.
After 64 days at sea, the RV Polarstern and icy surroundings have officially started to feel like home. I can’t believe how quickly the time has passed, but here we are at the end of MOSAiC Leg 4! A truly special cruise, we witnessed the re-building and complete break-down of an ice camp, the peak and end of the spring under-ice bloom, and were the last to sample from the original MOSAiC floe before that singular, well characterized piece of ice (chosen all the way back in last October!) finally reached the marginal ice zone, and the end of its life as a contiguous floe. It’s really quite incredible to have had the opportunity to contribute to this astounding time series. As part of this expedition, over 200 individual scientists have worked on this one piece of ice, following its drift across the Arctic. Even if we cannot determine the full scope of this project or see all the results just now – it is clear that these data will have an impact for generations to come and, especially as a student, I feel so lucky to have contributed to this legacy.
When we first arrived at the MOSAiC floe, we were very excited to find enough of it intact to re-establish the Central Observatory ice camp. Very soon, versions 2.0 of all the ice “cities” began popping up across the floe and the science began in earnest. However, every day we had new reminders of the fact that we were decidedly in the Arctic melt season. Melt-ponds became a dominant feature and often, new roads and pathways had to be forged on the fly to get to sampling locations. For example, starting at around 1.5 m the first week, our last first year ice cores were only 90 cm long.
These melt-dynamics not only provided physical challenges to working on the ice, but also scientific ones as well. How to capture this fresh-water lens and study the impacts of such surface stratification on the biomass blooming beneath the ice? This stratification was seen most clearly in the lead systems surrounding the ship. After several surveys, we were able to characterize 3 clear layers – surface freshwater, a green algal layer (brackish salinities), and the underlying seawater. Over time, the living layer shoaled and went from a happy photosynthetic green, to a clearly dying, particulate organic matter greenish/brown. Capturing this bloom transition was quite exciting for us and I look forward to analyzing how the microbial community in these layers evolved.
In addition to these opportunistic events, Alessandra D’Angelo (PhD, URI) and I were happy to continue progress on the core MethOx project work, started by Jessie Creamean (PhD, CSU) on Leg 1, and Jeff on Leg 3. When conditions allowed, we were lucky enough to have nearly daily CTD casts from the Polarstern rosette (thank you Team Ocean!) and were therefore never in want of water. With both of us on board we were able to maximize sampling and analysis, collecting almost 278 unique samples for the core project work alone! We measured weekly seawater profiles for microbial community structure in conjunction with ambient methane concentrations/isotopes and ran experimental samples to study potential oxidation/production rates of methane using elevated methane in select incubations.
BUT – it’s not over yet. Even as I now take this time to reflect on Leg 4, we are quite busy with preparations for Leg 5 where we will head north and witness the re-freeze of the Arctic fall! And although it will be bittersweet to part from our Leg 4 colleagues… the Akademik Tryoshnikov has arrived and the handover must begin. I look forward to continuing on as the Bowman Lab/MethOx Project representative on MOSAiC Leg 5 and can’t wait to see where the RV Polarstern takes us next!
I was fortunate to have the chance to talk about some of our experiences on MOSAiC Leg 3 with Paul Vogelzang, host of the Smithsonian’s Not Old-Better podcast. The full interview can be found here:
PhD student Emelia Chamberlain sends the following dispatch from Polarstern.
Operating an international expedition in the remote central Arctic would always be a logistically taxing endeavor. Operating an international expedition in the remote central Arctic during a global pandemic is that much more challenging. But through the incredible perseverance of a delayed Leg 3 team and hard work from the dedicated logistical teams at the Alfred Wegner Institute in Germany, MOSAiC Leg 4 is underway!
Our NSF funded project work on MOSAiC will be carried out on this leg by URI Post-doc Alessandra D’Angelo and myself. In a reshuffling of plan operations, I found myself headed North almost a month earlier than expected (and for her almost a month later). On May 1, 2020, approx. one hundred scientists and crew began two weeks of quarantine in a local hotel in Bremerhaven. For the first week we were in total isolation within individual hotel rooms. Meals were brought to our door by the incredible hotel staff. Upon beginning our individualized quarantine, we took two tests, the first test on our arrival and the second one seven days later. We were all to remain in individualized quarantine until the results of our second coronavirus test came back. Thankfully, everyone tested negative and after seven long days we entered Phase 2 – group quarantine. Even weeks and 3,000 km away from meeting with the Polarstern, MOSAiC Leg 4 had finally begun. Tentatively at first, we emerged from our rooms to gather for meals, planning, and perhaps most importantly, safety briefings. While we were able to be around each other, we still practiced strict social distancing precautions.
After another 10 days, and negative test results for all, we began our journey Northward on the R/V Maria S. Merian and R/V Sonne to meet with the MOSAiC workhorse, R/V Polarstern. While overall lovely research vessels, the MS Merian and Sonne are not icebreakers, meaning the Polarstern would need to travel south of the ice edge to refuel and make the personnel exchange. Therefore, all ships coordinated to meet in Adventfjorden, Svalbard. It required a lot of time, but it was a beautiful journey. Due to some unexpected ice-pressure preventing southerly travel we ended up reaching the designated meeting point (near Longyearbyen, Norway) almost 2 weeks earlier than Polarstern! I spent most of the time planning, catching up on some coding, and working out in the many group sporting activities being held on board, all in prep for the labor-intensive ice activities. (Zumba and yoga are of course perfect analogs for pulling sledges of equipment across slushy sea ice…)
Once the Polarstern finally arrived in Adventfjorden, a flurry of handover activities commenced. We met with the Leg 3 team, explored the labs, reviewed protocols, and heard all about their experience on the ice. Prior to departing the MOSAiC ice floe, there was a dynamic shift in the ice leading to some sampling sites splitting from the rest. This, however, is not surprising given its fast trajectory south and the onset of the summer melt-season. The ice drift (which can be tracked here in real time) has brought the MOSAiC floe to approximately 82º latitude, air temperatures mostly remain above freezing, and surface water is currently measured at -1.7ºC (warm for under-ice). With polar summer in full swing and such exciting ice dynamics, I look forward to getting back to the floe and tracking the ecological changes through this transition! But first, we must get there. I’m hoping to utilize this next phase of transit to explore the R/V Polarstern. Since it will be my new home for the next ~4.5 months, I suppose I should learn how to find my way from the lab to the mess hall without getting lost…
Many thanks to Michelle Babcock, communications specialist for the OAST project, for producing this great video of our 2019 field effort at the South Bay Salt Works in San Diego. This field effort was supposed to be a low-key opportunity for the OAST team to practice working together in the field, but the ideal nature of the site and some of our preliminary findings have elevated it to a primary field site. It’s a good thing too… with COVID-19 making a hash of international travel plans it’s the one place we know we can reach!
A common exercise in environmental microbiology is counting bacterial cells with an epifluorescent microscope. During my PhD I spend many hours hunched over a microscope in a darkened room, contemplating which points of light were bacteria (and should thus be counted) and which were not. With a large cup of coffee from Seattle’s U District Bean and Bagel and an episode of “This American Life” playing in the background it’s not a bad way to spend the day. But it definitely felt like a procedure that needed some technological enhancement. My biggest concerns were always objectivity and reproducibility; it’s really hard to determine consistently which “regions of interest” (ROIs) to count. There are of course commercial software packages for identifying and counting ROIs in a microscope image. But why pay big money for a software subscription when you can write your own script? I had some free time during our slow transit to Polarstern during MOSAiC Leg 3 and thought I’d give it a try. The following tutorial borrows heavily from the image segmentation example in Wherens and Kruisselbrink, 2018.
We start with a png image from a camera attached to a microscope. The green features are bacteria and phytoplankton that have been stained with Sybr Green. These are the ROIs that we’d like to identify and count. The image quality is actually pretty bad here; this was made with the epifluorescent scope at Palmer Station back in 2015, and the scope and camera needed some TLC. It turns out that I don’t actually have many such images on my laptop, and I can’t simply go and make a new one because we aren’t allowed in the lab right now! Nonetheless the quality is sufficient for our purposes.
First, we need to convert the image into an RGB matrix that we can work with. I’m sure there’s a way to do this in R, but I couldn’t find an expedient method. Python made it easy.
## convert image to two matrices: a 3 column RGB matrix and
## 2 column xy matrix
import matplotlib.image as mpimg
name = '15170245.png'
name = name.rstrip('.png')
img = mpimg.imread(name + '.png')
with open(name + '.rgb4r.csv', 'w') as rgb_out, open(name + '.xy.csv', 'w') as xy_out:
for i in range(0, img.shape[1]):
for j in range(0, img.shape[0]):
print(img[j, i, 0], img[j, i, 1], img[j, i, 2], sep = ',', file = rgb_out)
print(i + 1, j + 1, sep = ',', file = xy_out)
Next we break out R to do the actual analysis (which yes, could be done in Python…). The basic strategy is to use a self organizing map (SOM) with 2 layers. One layer will be color, the other will be position. We’ll use this information to identify distinct classes corresponding to diagnostic features of ROIs. Last, we’ll iterate across all the pixels that appear to belong to ROIs and attempt to draw an ellipse around each group of pixels that makes up a ROI. First, we read in the data produced by the Python script:
Then we define a function to render the image described by these matrices:
plotImage <- function(scope.xy, scope.rgb){
image.col <- rgb(scope.rgb[,"red"], scope.rgb[,"green"], scope.rgb[,"blue"], maxColorValue = 1)
x.dim <- max(scope.xy$X)
y.dim <- max(scope.xy$Y)
temp.image <- 1:dim(scope.rgb)[1]
dim(temp.image) <- c(y.dim, x.dim)
image(0:x.dim,
0:y.dim,
t(temp.image),
col = image.col,
ylab = paste0('Y:', y.dim),
xlab = paste0('X:', x.dim))
}
## give it a test
plotImage(scope.xy, scope.rgb)
You'll note that the function flips the image. While annoying, this doesn't matter at all for identifying ROIs. If it bothers you go ahead and tweak the function :). Now we need to train our SOM. The SOM is what does the heavy lifting of identifying different types of pixels in the image.
Now partition the SOM into k classes with k-means clustering. The value for k has to be determined experimentally but should be consistent for all the images in a set (i.e. a given type of microscopy image).
som.codes <- som.model$codes[[1]]
som.codes.k <- kmeans(som.codes, centers = 6)
## Get mean colors for clusters (classes)
class.col <- c()
for(k in 1:max(som.codes.k$cluster)){
temp.codes <- som.codes[which(som.codes.k$cluster == k),]
temp.col <- colMeans(temp.codes)
temp.col <- rgb(temp.col[1], temp.col[2], temp.col[3], maxColorValue = 1)
class.col <- append(class.col, temp.col)
}
## Make a plot of the som with the map units colored according to mean color
## of owning class.
plot(som.model,
type = 'mapping',
bg = class.col[som.codes.k$cluster],
keepMargins = F,
col = NA)
text(som.model$grid$pts, labels = som.codes.k$cluster)
add.cluster.boundaries(som.model, som.codes.k$cluster)
Here's where we have to be a bit subjective. We need to make an informed decision about which classes constitute ROIs. Looking at this map I'm gonna guess 3 and 6. The classes and structure of your map will of course be totally different, even if you start with the same training image. To make use of this information we first predict which classes our original pixels belong to.
## predict for RGB only
image.predict <- predict(som.model, newdata = list('rgb' = data.matrix(scope.rgb)), whatmap = 'rgb')
Then we identify those pixels that below to the classes we think describe ROIs.
## select units that correspond to ROIs
target.units = which(som.codes.k$cluster %in% c(3, 6))
target.pixels <- scope.xy[which(image.predict$unit.classif %in% target.units), c('X', 'Y')]
Now comes the tricky bit. Knowing which pixels belong to ROIs isn't actually that useful, as each ROI is composed of many different pixels. So we need to aggregate the pixels into ROIs. Again, this requires a little experimentation, but once you figure it out for a given sample type it should work consistently. The key parameter here is "resolution" which we define as how far apart two pixels of the same class need to be to constitute different ROIs. The value 20 seems to work reasonably well for this image.
## loop through all pixels. if there's a pixel within n distance of it, check to
## see if that pixel belongs to an ROI. If so, add the new pixel to that area. If not,
## create a new ROI. Currently a pixel can be "grabbed" by an ROI produced later.
findROI <- function(resolution = 20){
roi <- 1
roi.pixels <- as.data.frame(matrix(nrow = dim(target.pixels)[1], ncol = 3))
colnames(aoi.pixels) <- c('X', 'Y', 'roi')
for(i in 1:dim(target.pixels)[1]){
if(is.na(roi.pixels[i, 'roi']) == T){
pixel.x <- target.pixels[i, 'X']
pixel.y <- target.pixels[i, 'Y']
nns <- which(abs(target.pixels[, 'X'] - pixel.x) < resolution & abs(target.pixels[, 'Y'] - pixel.y) < resolution)
roi.pixels[nns, c('X', 'Y')] <- target.pixels[nns, c('X', 'Y')]
roi.pixels[nns, 'roi'] <- roi
roi <- roi + 1
}
}
return(roi.pixels)
}
roi.pixels <- findROI()
roi.table <- table(roi.pixels$roi)
To evaluate our discovery of ROIs we plot an ellipse around each ROI in the original image.
## approximate each roi as an ellipse. need x, y, a, b
plotROI <- function(roi.pixels){
require(plotrix)
for(roi in unique(roi.pixels$roi)){
temp.pixels <- roi.pixels[which(roi.pixels$roi == roi),]
temp.a <- max(temp.pixels$X) - min(temp.pixels$X)
temp.b <- max(temp.pixels$Y) - min(temp.pixels$Y)
temp.x <- mean(temp.pixels$X)
temp.y <- mean(temp.pixels$Y)
plot.y <- temp.y
draw.ellipse(temp.x, plot.y, temp.a, temp.b, border = 'red')
}
}
plotImage(scope.xy, scope.rgb)
plotROI(roi.pixels)
It certainly isn't perfect, the two chained diatoms in particular throw off our count. We did, however, do a reasonable job of finding all the small ROIs that represent the smaller, harder to count cells. So how does the model perform for ROI identification on a new image? Here's a new image acquired with the same exposure settings on the same scope. We use the same Python code to convert it to RGB and XY matrices.
## convert image to two matrices: a 3 column RGB matrix and
## 2 column xy matrix
import matplotlib.image as mpimg
name = '14000740.png'
name = name.rstrip('.png')
img = mpimg.imread(name + '.png')
with open(name + '.rgb4r.csv', 'w') as rgb_out, open(name + '.xy.csv', 'w') as xy_out:
for i in range(0, img.shape[1]):
for j in range(0, img.shape[0]):
print(img[j, i, 0], img[j, i, 1], img[j, i, 2], sep = ',', file = rgb_out)
print(i + 1, j + 1, sep = ',', file = xy_out)
Not bad! Again, it isn't perfect, some ROIs are grouped together and some are missed (largely a function of the variable focal plane). These can be fixed by experimenting with the model parameters and resolution. We did accomplish the goal of improving objectivity and reproducibility; our approach isn't always right, but at least it's wrong in a consistent way! Of course an additional advantage is if you had hundreds of such images, perhaps representing multiple randomly selected fields from many images, you could process in a few minutes what would take many hours to count.
At this point I can feel the collective judgement of every environmental microbiologist since van Leeuwenhoek for promoting a method that might reduce the serendipitous discovery that comes with spending hours staring through a microscope. So here's a reminder to spend time getting familiar with your samples under the microscope, regardless of how you identify ROIs!
Self-organizing maps (SOMs) are a form of neural network and a wonderful way to partition complex data. In our lab they’re a routine part of our flow cytometry and sequence analysis workflows, but we use them for all kinds of environmental data (like this). All of the mainstream data analysis languages (R, Python, Matlab) have packages for training and working with SOMs. My favorite is the R package Kohonen, which is simple to use but can support some fairly complex analysis through SOMs with multiple data layers and supervised learning (superSOMs). The Kohonen package has a nice, very accessible paper that describe its key features, and some excellent examples. This tutorial applies our basic workflow for a single-layer SOM to RGB color data. RGB color space segmentation is a popular way to evaluate machine learning algorithms, as it is intrinsically multi-variate and inherently meaningful. Get like colors grouping together and you know that you’ve set things up correctly!
This application of SOMs has two steps. Each of these steps can be thought of as an independent data reduction step. It’s important to remember that you’re not reducing dimensions per se, as you would in a PCA, you’re aggregating like data so that you can describe them as convenient units (instead of n individual observations). The final outcome, however, represents a reduction in dimensionality to a single parameter for all observations (e.g., the color blue instead of (0, 0, 255) in RGB colorspace). The first step – training the SOM – assigns your observations to map units. The second step – clustering the map units into classes – finds the structure underlying the values associated with the map units after training. At the end of this procedure each observation belongs to a map unit, and each map unit belongs to a class. Thus each observation inherits the class of its associated map unit. If that’s not clear don’t sweat it. It will become clear as you go through the procedure.
First, let’s generate some random RGB data. This takes the form of a three column matrix where each row is a pixel (i.e. an observation).
#### generate some RGB data ####
## select the number of random RGB vectors for training data
sample.size <- 10000
## generate dataframe of random RGB vectors
sample.rgb <- as.data.frame(matrix(nrow = sample.size, ncol = 3))
colnames(sample.rgb) <- c('R', 'G', 'B')
sample.rgb$R <- sample(0:255, sample.size, replace = T)
sample.rgb$G <- sample(0:255, sample.size, replace = T)
sample.rgb$B <- sample(0:255, sample.size, replace = T)
Next, we define a map space for the SOM and train the model. Picking the right grid size for the map space is non-trivial; you want about 5 elements from the training data per map unit, though you’ll likely find that they’re not uniformly distributed. It’s best to use a symmetrical map unless you have a very small training dataset, hexagonal map units, and a toroidal shape. The latter is important to avoid edge effects (a toroid has no edges).
One important caveat for the RGB data is that we’re not going to bother with any scaling or normalizing. The parameters are all on the same scale and evenly distributed between 0 and the max value of 255. Likely your data are not so nicely formed!
#### train the SOM ####
## define a grid for the SOM and train
library(kohonen)
grid.size <- ceiling(sample.size ^ (1/2.5))
som.grid <- somgrid(xdim = grid.size, ydim = grid.size, topo = 'hexagonal', toroidal = T)
som.model <- som(data.matrix(sample.rgb), grid = som.grid)
One you’ve trained the SOM it’s a good idea to explore the output of the `som` function to get a feel for the different items in there. The output takes the form of nested lists. Here we extract a couple of items that we’ll need later, and also create a distance matrix of the map units. We can do this because the fundamental purpose of map units is to have a codebook vector that mimics the structure of the training data. During training each codebook vector is iteratively updated along with its neighbors to match the training data. After sufficient iterations the codebook vectors reflect the underlying structure of the data.
## extract some data to make it easier to use
som.events <- som.model$codes[[1]]
som.events.colors <- rgb(som.events[,1], som.events[,2], som.events[,3], maxColorValue = 255)
som.dist <- as.matrix(dist(som.events))
Now that we have a trained SOM let’s generate a descriptive plot. Since the data are RGB colors, if we color the plot accordingly it should be sensible. For comparison, we first create a plot with randomized codebook vectors. This represents the SOM at the start of training.
## generate a plot of the untrained data. this isn't really the configuration at first iteration, but
## serves as an example
plot(som.model,
type = 'mapping',
bg = som.events.colors[sample.int(length(som.events.colors), size = length(som.events.colors))],
keepMargins = F,
col = NA,
main = '')
And now the trained SOM:
## generate a plot after training.
plot(som.model,
type = 'mapping',
bg = som.events.colors,
keepMargins = F,
col = NA,
main = '')
So pretty! The next step is to cluster the map units into classes. As with all clustering analysis, a key question is how many clusters (k) should we define? One way to inform our decision is to evaluate the distance between all items assigned to each cluster for many different values of k. Ideally, creating a scree plot of mean within-cluster distance vs. k will yield an inflection point that suggests a meaningful value of k. In practice this inflection point is extremely sensitive to the size of the underlying data (in this case, the number of map units), however, it can be a useful starting place. Consider that the RGB data were defined continuously, meaning that there is no underlying structure! Nonetheless we still get an inflection point.
#### look for a reasonable number of clusters ####
## Evaluate within cluster distances for different values of k. This is
## more dependent on the number of map units in the SOM than the structure
## of the underlying data, but until we have a better way...
## Define a function to calculate mean distance within each cluster. This
## is roughly analogous to the within clusters ss approach
clusterMeanDist <- function(clusters){
cluster.means = c()
for(c in unique(clusters)){
temp.members <- which(clusters == c)
if(length(temp.members) > 1){
temp.dist <- som.dist[temp.members,]
temp.dist <- temp.dist[,temp.members]
cluster.means <- append(cluster.means, mean(temp.dist))
}else(cluster.means <- 0)
}
return(mean(cluster.means))
}
try.k <- 2:100
cluster.dist.eval <- as.data.frame(matrix(ncol = 3, nrow = (length(try.k))))
colnames(cluster.dist.eval) <- c('k', 'kmeans', 'hclust')
for(i in 1:length(try.k)) {
cluster.dist.eval[i, 'k'] <- try.k[i]
cluster.dist.eval[i, 'kmeans'] <- clusterMeanDist(kmeans(som.events, centers = try.k[i], iter.max = 20)$cluster)
cluster.dist.eval[i, 'hclust'] <- clusterMeanDist(cutree(hclust(vegdist(som.events)), k = try.k[i]))
}
plot(cluster.dist.eval[, 'kmeans'] ~ try.k,
type = 'l')
lines(cluster.dist.eval[, 'hclust'] ~ try.k,
col = 'red')
legend('topright',
legend = c('k-means', 'hierarchical'),
col = c('black', 'red'),
lty = c(1, 1))
Having picked a reasonable value for k (let’s say k = 20) we can evaluate different clustering algorithms. For our data k-means almost always performs best, but you should choose what works best for your data. Here will evaluate k-means, hierarchical clustering, and model-based clustering. What we’re looking for in the plots is a clustering method that produces contiguous classes. If classes are spread all across the map, then the clustering algorithm isn’t capturing the structure of the SOM well.
#### evaluate clustering algorithms ####
## Having selected a reasonable value for k, evaluate different clustering algorithms.
library(pmclust)
## Define a function for make a simple plot of clustering output.
## This is the same as previousl plotting, but we define the function
## here as we wanted to play with the color earlier.
plotSOM <- function(clusters){
plot(som.model,
type = 'mapping',
bg = som.events.colors,
keepMargins = F,
col = NA)
add.cluster.boundaries(som.model, clusters)
}
## Try several different clustering algorithms, and, if desired, different values for k
cluster.tries <- list()
for(k in c(20)){
## model based clustering using pmclust
som.cluster.pm.em <- pmclust(som.events, K = k, algorithm = 'em')$class # model based
som.cluster.pm.aecm <- pmclust(som.events, K = k, algorithm = 'aecm')$class # model based
som.cluster.pm.apecm <- pmclust(som.events, K = k, algorithm = 'apecm')$class # model based
som.cluster.pm.apecma <- pmclust(som.events, K = k, algorithm = 'apecma')$class # model based
som.cluster.pm.kmeans <- pmclust(som.events, K = k, algorithm = 'kmeans')$class # model based
## k-means clustering
som.cluster.k <- kmeans(som.events, centers = k, iter.max = 100, nstart = 10)$cluster # k-means
## hierarchical clustering
som.dist <- dist(som.events) # hierarchical, step 1
som.cluster.h <- cutree(hclust(som.dist), k = k) # hierarchical, step 2
## capture outputs
cluster.tries[[paste0('som.cluster.pm.em.', k)]] <- som.cluster.pm.em
cluster.tries[[paste0('som.cluster.pm.aecm.', k)]] <- som.cluster.pm.aecm
cluster.tries[[paste0('som.cluster.pm.apecm.', k)]] <- som.cluster.pm.apecm
cluster.tries[[paste0('som.cluster.pm.apecma.', k)]] <- som.cluster.pm.apecma
cluster.tries[[paste0('som.cluster.pm.kmeans.', k)]] <- som.cluster.pm.kmeans
cluster.tries[[paste0('som.cluster.k.', k)]] <- som.cluster.k
cluster.tries[[paste0('som.cluster.h.', k)]] <- som.cluster.h
}
## Take a look at the various clusters. You're looking for the algorithm that produces the
## least fragmented clusters.
plotSOM(cluster.tries$som.cluster.pm.em.20)
plotSOM(cluster.tries$som.cluster.pm.aecm.20)
plotSOM(cluster.tries$som.cluster.pm.apecm.20)
plotSOM(cluster.tries$som.cluster.pm.apecma.20)
plotSOM(cluster.tries$som.cluster.k.20)
plotSOM(cluster.tries$som.cluster.h.20)
For brevity I’m not showing the plots produced for all the different clustering algorithms. For these data the k-means and hierarchical clustering algorithms both look pretty good, I have a slight preference for the k-means version:
The SOM and final map unit clustering represent a classification model that can be saved for use with later data. Once huge advantage to using SOMs over other analysis methods (e.g., ordination techniques) is their usefulness for organizing newly collected data. New data, if necessary scared and normalized in the same way as the training data, can be classified by finding the map unit with the minimum distance to the new observation. To demonstrate this, we’ll generate and classify a small new RGB dataset (in reality classifying in this way is very efficient, and could accommodate a huge number of new observations). First, we save the SOM and final clustering.
## The SOM and map unit clustering represent a classification model. These can be saved for
## later use.
som.cluster <- som.cluster.k
som.notes <- c('Clustering based on k-means')
save(file = 'som_model_demo.Rdata', list = c('som.cluster', 'som.notes', 'som.model', 'som.events.colors'))
Then we generate new RGB data, classify it, and make a plot to compare the original data, the color of the winning map unit, and the color of the cluster that map unit belongs to.
#### classification ####
## make a new dataset to classify ##
new.data.size <- 20
new.data <- as.data.frame(matrix(nrow = new.data.size, ncol = 3))
colnames(new.data) <- c('R', 'G', 'B')
new.data$R <- sample(0:255, new.data.size, replace = T)
new.data$G <- sample(0:255, new.data.size, replace = T)
new.data$B <- sample(0:255, new.data.size, replace = T)
## get the closest map unit to each point
new.data.units <- map(som.model, newdata = data.matrix(new.data))
## get the classification for closest map units
new.data.classes <- som.cluster[new.data.units$unit.classif]
## compare colors of the new data, unit, and class, first define a function
## to calculate the mean colors for each cluster
clusterMeanColor <- function(clusters){
cluster.means = c()
som.codes <- som.model$codes[[1]]
for(c in sort(unique(clusters))){
temp.members <- which(clusters == c)
if(length(temp.members) > 1){
temp.codes <- som.codes[temp.members,]
temp.means <- colMeans(temp.codes)
temp.col <- rgb(temp.means[1], temp.means[2], temp.means[3], maxColorValue = 255)
cluster.means <- append(cluster.means, temp.col)
}else({
temp.codes <- som.codes[temp.members,]
temp.col <- rgb(temp.codes[1], temp.codes[2], temp.codes[3], maxColorValue = 255)
cluster.means <- append(cluster.means, temp.col)
})
}
return(cluster.means)
}
class.colors <- clusterMeanColor(som.cluster)
plot(1:length(new.data$R), rep(1, length(new.data$R)),
col = rgb(new.data$R, new.data$G, new.data$B, maxColorValue = 255),
ylim = c(0,4),
pch = 19,
cex = 3,
xlab = 'New data',
yaxt = 'n',
ylab = 'Level')
axis(2, at = c(1, 2, 3), labels = c('New data', 'Unit', 'Class'))
points(1:length(new.data.units$unit.classif), rep(2, length(new.data.units$unit.classif)),
col = som.events.colors[new.data.units$unit.classif],
pch = 19,
cex = 3)
points(1:length(new.data.classes), rep(3, length(new.data.classes)),
col = class.colors[new.data.classes],
pch = 19,
cex = 3)
Looks pretty good! Despite defining only 20 classes, class seems to be a reasonable representation of the original data. Only slight differences in color can be observed between the data, winning map unit, and class.
Where to begin… I started writing this post several days ago on a nearly empty plane flying from Charlotte, N.C. to San Diego. That flight marked the end of MOSAiC Leg 3 for me, though Leg 3 will continue for several more weeks. We were supposed to be done on April 4, however, the COVID-19 pandemic and dynamic sea ice conditions pretty well hashed that plan. I took advantage of a single opportunity to leave the Polarstern early (meaning only a couple of weeks late) to help with the childcare situation at home. The remaining brave and dedicated crew and scientists are expected to return by ship to Europe in late May.
Our journey began in Tromsø, Norway in the innocent days of January when COVID-19 seemed like a regional rather than global problem. We had several Chinese expedition members on Leg 3, but they all made it out just fine and – though they were concerned about the situation back home – we figured things would improve in due time. After a week of safety training we were transported to the Russian icebreaker Kapitan Dranitsyn for what we thought would be a 2-3 week voyage to the Polarstern.
A lot of thought and effort went into determining how the MOSAiC scientists and crew would reach Polarstern throughout the drift. During the winter a conventional icebreaker of Dranitsyn‘s class is not the best way to reach a location in the central Arctic, however, it was the best among the affordable and available options. And in the end it did the job. I’m not sure of the history, but I’m fairly certain that its feat of reaching Polarstern in the dead of winter is unprecedented.
We spent a week in a pleasant fjord just outside of Tromsø waiting for a weather break to cross the Barents Sea to the pack ice. The Barents Sea is notoriously stormy, and we needed wave heights below 4 m to attempt the crossing. Ice breakers don’t ride well in heavy seas, and there were refrigerated shipping containers carrying (our) food for Polarstern stored on deck in the bow. We couldn’t take big waves on the bow, nor could we tolerate much ice formation on the chilling units. Both happened anyway. But at least the crossing was relatively short, and within 72 hours we were in the pack ice.
The Dranitsyn ended up being a bigger part of Leg 3 than any of us imagined at the time. Had our departure from Polarstern gone as expected, we would have been in transit on Dranitsyn as long as we were doing science on Polarstern. It was an interesting experience. Scientists are pretty good at keeping themselves busy; most of us have a long backlog of analyses to complete and papers to write, in addition to fielding emails from students and colleagues, and handling administrative matters. Absent the internet or email, however, a lot of these responsibilities disappeared. I worked on a proposal and finished a (3 year overdue) paper. Much of the rest of the 5 weeks we were on Dranitsyn I spent in discussion with my Leg 3 colleagues. It was something like an extended MOSAiC workshop. We covered everything from synthesizing across different science themes, to how time and on-ice resources would be shared between groups in a typical week.
The roughly 2 week delay in reaching the Polarstern was due in large part to ice conditions. Despite previously expeditions onboard icebreakers I hadn’t really thought out it this way before, but the ice compresses and relaxes in response to tides, winds, and other external forces. When the ice is compressed it’s incredibly difficult for an ice breaker to break; there’s simply no place for the ice to be displaced to. Under a relaxed state more leads are open, and there’s more space within the pack ice accommodate the displaced ice as the ship moves through. Temperature also plays a role. Icebreakers are more typically used during the spring summer and fall, when maritime shipping in the Arctic is active and more research activities are taking place. Spring, summer, and fall sea ice is naturally much warmer than winter sea ice. Warmer sea ice is softer and breaks more easily, and the surface of the ice has less friction. This means an icebreaker can more easily ride up and over the ice to break it. With temperatures low and the ice in a “compressed” state it was a tough grind, as this video from our embedded videographers from the UFA production company shows:
Despite the conditions and some weeks of uncertainty we did eventually make it to Polarstern. Seeing that little point of light appear on the horizon after all those weeks of travel made me appreciate how alone we were up there at that time of year.
After we reached Polarstern it took nearly a week to transfer cargo, exchange the scientists and crew, and become familiar enough with our tasks and the surroundings to take over. Many of the MOSAiC observations take place in what’s called the central observatory. This is an aggregation of relatively stable floes around Polarstern that house a number of on-ice installations. These include the colloquially named Met City, Balloon Town, Ocean City, and Droneville sites, among others. Beyond the central observatory are various “pristine” sites for snow and ice sampling, and beyond those lie the nodes of the less frequently visited distributed observatory. The distributed observatory is critical because it provides some spatial context to the intensive observations of the central observatory.
We had about a week of “normal” operations before the central Arctic started throwing plot twists at us. The ice was surprisingly dynamic. The reasons behind this will, I think, be an important science outcome of MOSAiC. Thinner, rougher sea ice? More wind stress? Whatever the cause the ice cracked a lot. By chance we were located in what ice dynamicists call a “shear zone”, an area of enhanced kinetic energy within the pack. Here’s the first emergence of a crack, on March 11. You can see the logistics team scrambling to move the snow machines to a safer location. Over the next few weeks this crack grew into a major and ever-evolving lead.
By April everyone was pretty used to cracks, leads, and ridges in the central observatory. Overall I was impressed with the resilience of the various team as different installations were threatened, and in some cases destroyed. For the ATMOS (atmospheric sciences) team in particularly, Leg 3 should have been relatively easy – low temps not withstanding. Everyone expected consolidated ice and stable, well-established instruments and protocols. Instead, for a period of time there was a near-daily scramble to maintain power and infrastructure at the Met City site. The adjacent Remote Sensing site was enveloped by a large ridge system and had to be relocated to near the logistics area. Setting up these sites is something that took specialists on Leg 1 many days, on Leg 3 systems had to be dismantled and re-established on the fly. Because spring is a particularly interesting time for atmospheric chemistry in the Arctic the clock was ticking every time a site or instrument went down. The dedication and ingenuity of the scientists at the Met City and the Remote Sensing sites was great to observe. The rest of us helped where we could, but we had our hands pretty full with other problems.
At the top of the problems list was the loss of the hole for the main CTD/rosette system. The CTD is an instrument package that measure conductivity, temperature, and pressure (depth) along with other parameters. It is the fundamental tool in ship-based oceanography. The CTD is attached to a long conductive wire, and embedded within a rosette of sample bottles. The sample bottles are fired at specific depths to collect water for a number of analyses and experiments. Lots of parameters and projects were dependent on this sampling system, and a huge amount of effort had been expended to construct and maintain a hole in the ice for deploying it. On March 15, however, the ice shifted, pushing Polarstern forward. This caused superficial damage to the Weatherhaven covering the hole, but more significantly placed the hole out of reach of the crane that operates the CTD/rosette. Just like that we lost all of our capacity to sample below 1000 m (the central Arctic Ocean is deeper than 4000 m in most places) and to collect large volumes of water from any depth. All sampling had to shift to a much smaller system at the Ocean City site.
Why couldn’t we simply make a new hole? It’s worth remembering that the ice in March is near its maximum thickness. It was roughly 160 cm thick when access to the main CTD hole was lost. This discounts any rafting of multiple ice floes, which was probable, and could easily double or triple the thickness. Assuming only a single layer of ice, the way to make a hole big enough for the CTD is with a chainsaw. The thickness of the ice that you can cut is limited by the size of the chainsaw bar. Maybe commercial logging operations have a 2 m bar, we certainly did not! You can cut the ice out in layers – I’ve had to do this in the Antarctic before – but the problem is that you create a bathtub as soon as you start cutting the final layer. To finish the job you’d need a snorkel for yourself and the chainsaw!
All our water column sampling had to shift to Ocean City, and focus on the upper 1000 m of the water column. Ocean City is the main site for the physical oceanography team. It was designed to accommodate a small team taking ultra-high resolution measurements of the surface ocean. The physical oceanographers went above and beyond sharing their space and resources, and I ended up thoroughly enjoying the time that I spent out at Ocean City. The below video was made on April 20 by lowering a GoPro through the CTD hole at Ocean City while one of the physical oceanographers is conducting high resolution profiling of temperature and salinity. You can see the microstructure profiler used for this near the end of the sequence.
In addition to the water column sampling we carried out sea ice sampling, when conditions allowed. To minimize the impact of light pollution from the vessel on the growth of sea ice algae our preferred ice coring sites were located some distance from the ship. Through the spring and summer, most of the photosynthesis taking place in the central Arctic occurs in the ice itself, rather than the water column. The ice algae have more consistent access to light than their planktonic counterparts, and are famously sensitive to even the lowest levels of light. Ambient light from the ship is more than enough to induce growth in the vicinity during the long polar night. Distance from the ship combined with the dynamic ice conditions created some access challenges.
Despite the access challenges we got some great ice core samples. We fielded two ice coring teams, one for first year ice and one for second year ice. I had the pleasure of working with the second year ice coring team. It was a great US-German-Russian collaborative effort, and we had some good times out there!
The original plan for exchanging Leg 3 with Leg 4 involved flying us all out on ski equipped Antonov An-74 aircraft. This would have been a slick and expedient way to carry out the exchange. It also requires a pretty long runway and permissive global travel. By mid-March it was clear that both of these things were going to be an issue. I’ll be honest, there were some tense weeks where it wasn’t clear when and how Leg 3 would end, and what the future of MOSAiC would be. Kudos to cruise and expedition leadership for navigating us through the ups and downs. In particular AWI logistics had the difficult task of designing and scoping the possible solutions. They did an amazing job of iteratively working through a huge range of options to come up with the one that maximized science and minimized impacts on individual lives. But of course it was a compromise.
The current plan involves the Polarstern leaving the central observatory in mid-May for a rendezvous with one or more ships near Svalbard. The Leg 4 personnel (including Bowman Lab member Emelia Chamberlain) are already under strict quarantine in Bremerhaven, Germany. They’ll remain under quarantine until they depart for Svalbard at roughly the same time Polarstern leaves the central observatory. Once the crew has been exchanged, Leg 3 will sail for Germany and Leg 4 will begin the difficult task of re-establishing observations at the central observatory. An advantage of this plan is that it doesn’t require a complete breakdown of the central observatory. It will require, however, that many of the installations be partially disassembled for safety while Polarstern is away from the floe.
There was one opportunity to leave Polarstern before the official Leg 3-4 exchange. After agonizing over it for a couple of days I decided I needed to take advantage of the opportunity. After an epic few weeks our project was in decent shape, and with two young kids and no school or daycare, attention needed to shift to the home front. On April 22 I stepped onto a Twin Otter operated by Kenn Borek Air Ltd. to begin the long flight home with six other expedition members. We flew to Station Nord in Greenland, then across the Canadian Arctic via Eureka-Resolute-Arctic Bay-Churchill-Toronto, and finally to the US.
Immigration: “You’re coming from where?”
Me: “Resolute”
Immigration: “What were you doing in Resolute”
Me: “Just passing through, we were only there for a few minutes”
Immigration: “So where were you before that?”
Me: “Greenland, but again not very long. See there’s this ship…”
Immigration: “Uh, nevermind. Here’s your passport.”
In a sign of the times here’s Ana Barral’s virtual presentation for the 2020 American Society for Biochemistry and Molecular Biology meeting (held remotely). It’s a nice summary of the initial results of our course-based undergraduate research experience (CURE) on microbes on ocean plastics.
Check out the lectures posted by our team as part of the MOSAiC affiliated Massive Open Online Course. This course, available via both Coursera and Youtube was produced by the University of Colorado Boulder in partnership with the Alfred Wegener Institute and with funding from the National Science Foundation. It explores current research surrounding the Arctic ocean-ice-atmosphere system, as well as the questions driving the MOSAiC International Arctic Drift Expedition.
Module 5 Lectures 3-4: Learn from Jeff about the microbial communities living in sea ice.
Module 5 Lecture 7: Learn from Dr. Jesse Creamean, (Leg 1 MOSAiC participant), about atmospheric aerosols in the Arctic.
Module 5 Lecture 8: Learn from Dr. Brice Loose, (Co-PI of our NSF project), about why it is important to study the Arctic in the first place.
In the weeks leading up to starting
at Scripps Institution of Oceanography, where I am now a first year PhD
student, I often found myself – for better or worse – turning to Google for
advice on navigating the next five years. I would google any anxiety-fueled
question that popped into my brain, from “What is the hardest year of a PhD?”
to “How competitive is a PhD cohort?” I found the best answers online were from
older students, who used the space to reflect on their recent experiences and
offer up what they had learned along the ride. In that spirit, I hope some
recently accepted graduate student stumbles onto this as they furiously google,
and that it offers comfort and (hopefully) wisdom for a fun and tumultuous transition.
Below is a list of five things I
really learned – and relearned – as a first quarter grad student. I am only one
quarter done with a potentially six-year degree (which if we do some quick GRE
math, means I am one quarter of one sixth done, which is a little over 4
percent), so this is by no means an exhaustive list. I’m actually really
interested to see which of the five becomes more important as the years go on,
and please comment if as a graduate student you think I missed something
important.
Five lessons from the first quarter of graduate school:
Find help. This I think was the most often repeated piece of advice I saw going into my first year, and it really holds up. Finding people who I could be myself around – to whom I could ask stupid questions and lament about failed experiments with— was essential to my first quarter survival. My best memories from a packed schedule of lab and class are from study groups and group lab coding sessions. I also got involved on campus once I felt settled, which was a great way to meet and work alongside older students and mentors.
Planning ahead saves money. I worked in a lab as an undergrad and as a tech once I graduated, but I was never in charge of purchasing lab equipment or supplies. This quarter marked the transition for me between being an ignorant consumer of lab supplies to a conscientious one, now that I know exactly how expensive it all is and how much money you can waste by doing a poorly designed experiment (extra hint: include controls!). More generally, the application process for PhDs includes applying for grants, which is just the beginning in learning to apply for and manage money as a researcher. I’ve realized I have a lot to learn about budgeting and management in my journey to become a successful scientist.
Grades aren’t everything anymore. It was a hard habit to break, but all of your time shouldn’t be spent studying for your general first year classes. I learned to diversify how I obtained knowledge. Reading scientific papers and attending seminars from visiting professors were places where I learned the most this quarter. An afternoon spent reading a paper closely related to your research or an hour attending an interesting seminar often meant more to me than studying for a midterm in my more general oceanography classes.
Say yes. I am writing this from a ship off of the coast of Antarctica, where I am conducting field work, all because my advisor asked if I wanted to go and I said yes. Saying yes to collecting samples in the field is one example, but even to something simpler but still scary – like a surfing class or going to a social event where you don’t know anyone – just say yes.
You will make mistakes. I think the biggest lesson I learned from the first few months of grad school was how often you make mistakes. It is a daily (sometimes hourly) part of life, in both lab and in class. I am still working on how to learn from and move past mistakes, both large and small.
I think since I am still in my
first year I have yet to really experience burnout or writer’s block, which I
know happen often to older PhD students. I feel so fortunate to be able to
study and do science as a Bowman lab member at Scripps, and I hope my insight
from a great first quarter help put any prospective students that are reading this
at ease.