A common exercise in environmental microbiology is counting bacterial cells with an epifluorescent microscope. During my PhD I spend many hours hunched over a microscope in a darkened room, contemplating which points of light were bacteria (and should thus be counted) and which were not. With a large cup of coffee from Seattle’s U District Bean and Bagel and an episode of “This American Life” playing in the background it’s not a bad way to spend the day. But it definitely felt like a procedure that needed some technological enhancement. My biggest concerns were always objectivity and reproducibility; it’s really hard to determine consistently which “regions of interest” (ROIs) to count. There are of course commercial software packages for identifying and counting ROIs in a microscope image. But why pay big money for a software subscription when you can write your own script? I had some free time during our slow transit to Polarstern during MOSAiC Leg 3 and thought I’d give it a try. The following tutorial borrows heavily from the image segmentation example in Wherens and Kruisselbrink, 2018.
We start with a png image from a camera attached to a microscope. The green features are bacteria and phytoplankton that have been stained with Sybr Green. These are the ROIs that we’d like to identify and count. The image quality is actually pretty bad here; this was made with the epifluorescent scope at Palmer Station back in 2015, and the scope and camera needed some TLC. It turns out that I don’t actually have many such images on my laptop, and I can’t simply go and make a new one because we aren’t allowed in the lab right now! Nonetheless the quality is sufficient for our purposes.
First, we need to convert the image into an RGB matrix that we can work with. I’m sure there’s a way to do this in R, but I couldn’t find an expedient method. Python made it easy.
## convert image to two matrices: a 3 column RGB matrix and ## 2 column xy matrix import matplotlib.image as mpimg name = '15170245.png' name = name.rstrip('.png') img = mpimg.imread(name + '.png') with open(name + '.rgb4r.csv', 'w') as rgb_out, open(name + '.xy.csv', 'w') as xy_out: for i in range(0, img.shape[1]): for j in range(0, img.shape[0]): print(img[j, i, 0], img[j, i, 1], img[j, i, 2], sep = ',', file = rgb_out) print(i + 1, j + 1, sep = ',', file = xy_out)
Next we break out R to do the actual analysis (which yes, could be done in Python…). The basic strategy is to use a self organizing map (SOM) with 2 layers. One layer will be color, the other will be position. We’ll use this information to identify distinct classes corresponding to diagnostic features of ROIs. Last, we’ll iterate across all the pixels that appear to belong to ROIs and attempt to draw an ellipse around each group of pixels that makes up a ROI. First, we read in the data produced by the Python script:
scope.rgb <- read.csv('15170245.rgb4r.csv', header = F) scope.xy <- read.csv('15170245.xy.csv', header = F) colnames(scope.rgb) <- c('red', 'green', 'blue') colnames(scope.xy) <- c('X', 'Y')
Then we define a function to render the image described by these matrices:
plotImage <- function(scope.xy, scope.rgb){ image.col <- rgb(scope.rgb[,"red"], scope.rgb[,"green"], scope.rgb[,"blue"], maxColorValue = 1) x.dim <- max(scope.xy$X) y.dim <- max(scope.xy$Y) temp.image <- 1:dim(scope.rgb)[1] dim(temp.image) <- c(y.dim, x.dim) image(0:x.dim, 0:y.dim, t(temp.image), col = image.col, ylab = paste0('Y:', y.dim), xlab = paste0('X:', x.dim)) } ## give it a test plotImage(scope.xy, scope.rgb)
You'll note that the function flips the image. While annoying, this doesn't matter at all for identifying ROIs. If it bothers you go ahead and tweak the function :). Now we need to train our SOM. The SOM is what does the heavy lifting of identifying different types of pixels in the image.
#### train som #### som.grid <- somgrid(10, 10, 'hexagonal') som.model <- supersom(list('rgb' = data.matrix(scope.rgb), 'coords' = data.matrix(scope.xy)), whatmap = c('rgb', 'coords'), user.weights = c(1, 9), grid = som.grid)
Now partition the SOM into k classes with k-means clustering. The value for k has to be determined experimentally but should be consistent for all the images in a set (i.e. a given type of microscopy image).
som.codes <- som.model$codes[[1]] som.codes.k <- kmeans(som.codes, centers = 6) ## Get mean colors for clusters (classes) class.col <- c() for(k in 1:max(som.codes.k$cluster)){ temp.codes <- som.codes[which(som.codes.k$cluster == k),] temp.col <- colMeans(temp.codes) temp.col <- rgb(temp.col[1], temp.col[2], temp.col[3], maxColorValue = 1) class.col <- append(class.col, temp.col) } ## Make a plot of the som with the map units colored according to mean color ## of owning class. plot(som.model, type = 'mapping', bg = class.col[som.codes.k$cluster], keepMargins = F, col = NA) text(som.model$grid$pts, labels = som.codes.k$cluster) add.cluster.boundaries(som.model, som.codes.k$cluster)
Here's where we have to be a bit subjective. We need to make an informed decision about which classes constitute ROIs. Looking at this map I'm gonna guess 3 and 6. The classes and structure of your map will of course be totally different, even if you start with the same training image. To make use of this information we first predict which classes our original pixels belong to.
## predict for RGB only image.predict <- predict(som.model, newdata = list('rgb' = data.matrix(scope.rgb)), whatmap = 'rgb')
Then we identify those pixels that below to the classes we think describe ROIs.
## select units that correspond to ROIs target.units = which(som.codes.k$cluster %in% c(3, 6)) target.pixels <- scope.xy[which(image.predict$unit.classif %in% target.units), c('X', 'Y')]
Now comes the tricky bit. Knowing which pixels belong to ROIs isn't actually that useful, as each ROI is composed of many different pixels. So we need to aggregate the pixels into ROIs. Again, this requires a little experimentation, but once you figure it out for a given sample type it should work consistently. The key parameter here is "resolution" which we define as how far apart two pixels of the same class need to be to constitute different ROIs. The value 20 seems to work reasonably well for this image.
## loop through all pixels. if there's a pixel within n distance of it, check to ## see if that pixel belongs to an ROI. If so, add the new pixel to that area. If not, ## create a new ROI. Currently a pixel can be "grabbed" by an ROI produced later. findROI <- function(resolution = 20){ roi <- 1 roi.pixels <- as.data.frame(matrix(nrow = dim(target.pixels)[1], ncol = 3)) colnames(aoi.pixels) <- c('X', 'Y', 'roi') for(i in 1:dim(target.pixels)[1]){ if(is.na(roi.pixels[i, 'roi']) == T){ pixel.x <- target.pixels[i, 'X'] pixel.y <- target.pixels[i, 'Y'] nns <- which(abs(target.pixels[, 'X'] - pixel.x) < resolution & abs(target.pixels[, 'Y'] - pixel.y) < resolution) roi.pixels[nns, c('X', 'Y')] <- target.pixels[nns, c('X', 'Y')] roi.pixels[nns, 'roi'] <- roi roi <- roi + 1 } } return(roi.pixels) } roi.pixels <- findROI() roi.table <- table(roi.pixels$roi)
To evaluate our discovery of ROIs we plot an ellipse around each ROI in the original image.
## approximate each roi as an ellipse. need x, y, a, b plotROI <- function(roi.pixels){ require(plotrix) for(roi in unique(roi.pixels$roi)){ temp.pixels <- roi.pixels[which(roi.pixels$roi == roi),] temp.a <- max(temp.pixels$X) - min(temp.pixels$X) temp.b <- max(temp.pixels$Y) - min(temp.pixels$Y) temp.x <- mean(temp.pixels$X) temp.y <- mean(temp.pixels$Y) plot.y <- temp.y draw.ellipse(temp.x, plot.y, temp.a, temp.b, border = 'red') } } plotImage(scope.xy, scope.rgb) plotROI(roi.pixels)
It certainly isn't perfect, the two chained diatoms in particular throw off our count. We did, however, do a reasonable job of finding all the small ROIs that represent the smaller, harder to count cells. So how does the model perform for ROI identification on a new image? Here's a new image acquired with the same exposure settings on the same scope. We use the same Python code to convert it to RGB and XY matrices.
## convert image to two matrices: a 3 column RGB matrix and ## 2 column xy matrix import matplotlib.image as mpimg name = '14000740.png' name = name.rstrip('.png') img = mpimg.imread(name + '.png') with open(name + '.rgb4r.csv', 'w') as rgb_out, open(name + '.xy.csv', 'w') as xy_out: for i in range(0, img.shape[1]): for j in range(0, img.shape[0]): print(img[j, i, 0], img[j, i, 1], img[j, i, 2], sep = ',', file = rgb_out) print(i + 1, j + 1, sep = ',', file = xy_out)
Then we predict and plot.
scope.rgb <- read.csv('14000740.rgb4r.csv', header = F) scope.xy <- read.csv('14000740.xy.csv', header = F) colnames(scope.rgb) <- c('red', 'green', 'blue') colnames(scope.xy) <- c('X', 'Y') plotImage(scope.xy, scope.rgb) image.predict <- predict(som.model, newdata = list('rgb' = data.matrix(scope.rgb)), whatmap = 'rgb') # predict for rgb only target.units = which(som.codes.k$cluster %in% c(3,6)) target.pixels <- scope.xy[which(image.predict$unit.classif %in% target.units), c('X', 'Y')] roi.pixels <- findROI() roi.table <- table(roi.pixels$roi) plotImage(scope.xy, scope.rgb) plotROI(roi.pixels)
Not bad! Again, it isn't perfect, some ROIs are grouped together and some are missed (largely a function of the variable focal plane). These can be fixed by experimenting with the model parameters and resolution. We did accomplish the goal of improving objectivity and reproducibility; our approach isn't always right, but at least it's wrong in a consistent way! Of course an additional advantage is if you had hundreds of such images, perhaps representing multiple randomly selected fields from many images, you could process in a few minutes what would take many hours to count.
At this point I can feel the collective judgement of every environmental microbiologist since van Leeuwenhoek for promoting a method that might reduce the serendipitous discovery that comes with spending hours staring through a microscope. So here's a reminder to spend time getting familiar with your samples under the microscope, regardless of how you identify ROIs!