New paper on detecting successful mitigation of sulfide production

Congrats to Avishek Dutta for his new paper “Detection of sulfate-reducing bacteria as an indicator for successful mitigation of sulfide production” currently available as an early view in Applied and Environmental Microbiology. This was intended to be the second of two papers on a complex experiment that we participated in with BP Biosciences, but the trials and tribulations of peer review led this to be the first. We’re pretty excited about it.

Here’s the quick background. When microbes run out of oxygen the community turns to alternate electron acceptors through anaerobic respiration. One of these is sulfate, which anaerobic respiration reduces to hydrogen sulfide. In addition to smelling bad hydrogen sulfide is pretty reactive and forms sulfuric acid when dissolved in water. For industrial processes this is a problem. Sulfide can destroy products, inhibit desired reactions, and corrode pipes and equipment. To make matters worse, sulfate reducing bacteria (SRBs: those microbes that are capable of using sulfate as an alternate electron acceptor) can form tough biofilms that are hard to dislodge.

One way of dealing with undesired SRBs is to fight biology with biology and add a more preferential electron acceptor. Oxygen would of course work really well, but it typically isn’t feasible to implement oxygen injection on a really large scale. However, nitrate also works well. If nitrate is abundant nitrate reducing bacteria (NRBs) will outcompete SRBs for resources (e.g., labile carbon). Great! Now here’s the challenge… adding massive quantities of nitrate salts is expensive and likely has it’s own ecologically and environmental consequences. So we’d like to do this judiciously, adding just enough nitrate to the system to offset sulfate reduction. But how to know when you’ve added enough? In a really big system (like an oil field) the sulfide production can be happening very far from any possible sampling site so simply measuring the concentration of hydrogen sulfide doesn’t help much. But we can learn some useful things by monitoring the microbial community in the effluent.

Schematic of biofilm dispersal, leading to a recognizable signal in the effluent. From Dutta et al., 2021.

The figure above is a schematic of the formation and decay of the biofilm before, during, and after mitigation. In our study the biofilm was presumed to be sulfidogenic and the mitigation strategy was addition of nitrate salts, but the concept applies equally well to any biofilm and any mitigation strategy. The trick – and this is one of those things that seems painfully obvious after the fact but not before – is that you’re looking for the thing you’re mitigating to appear in the effluent. Although this might seem to suggest increased abundance in the system, it actually represents decay of the biofilm and loss from the system. To take this a step further we used paprica to predict genes in the effluent and then identified anomalies in the abundance of genes involved in sulfate reduction. This anomalies provide specific markers of successful mitigation and a means to a general strategy for monitoring the effectiveness of mitigation.

The detection of anomalies in the predicted abundance of relevant genes provides a way to detect the successful mitigation of SRBs (or any biofilm forming microbes). From Dutta et al., 2021.
Posted in Research | Leave a comment

New paper connecting aerosol optical depth to sea ice cover and ocean color

Congratulations to Srishti Dasarathy for her first first-authored publication! Srishti’s paper “Multi-year Seasonal Trends in Sea Ice, Chlorophyll Concentration, and Marine Aerosol Optical Depth in the Bellingshausen Sea” is out in advance of print in JGR Atmospheres. This paper was a really long time in coming. For this study, Srishti made use of several different satellite products including measurements of marine aerosol optical depth (MOAD) derived from the CALIPSO satellite. We are not a remote sensing lab and Srishti doesn’t come from a remote sensing or physics background, so the learning curve was pretty steep. It took a couple of years, a lot of Matlab tutorials, and an internship with the CALIPSO team at NASA’s Langley Research Center just to crack the CALIPSO data and start testing hypotheses. Srishti’s main hypothesis was that MOAD would be positively correlated with ocean color and negatively correlated with sea ice, since phytoplankton are known to be a source of volatile organic compounds that can form aerosol particles. Confounding this is that sea spray – which like phytoplankton is associated with open water periods – is also a source of aerosols.

The CALIPSO satellite “curtain”. Figure taken from https://www.globe.gov/web/s-cool/home/satellite-comparison/how-to-read-a-calipso-satellite-match.

One challenge that we faced was that CALIPSO represents data with high spatial resolution along a 2D path or “curtain”, as shown above. The orbital geometry is such that not every point on the globe gets covered; the same curtains get sampled every 16 days. Thus, while spatial resolution is high along the curtain, it is poor orthogonal to the curtain, and temporal resolution is limited to 16 days. This makes it a bit challenging to capture signals associated with relatively ephemeral events (such as phytoplankton blooms).

Basin-scale averages of MOAD, chlorophyll a, ice cover, and wind speed. From Dasarathy et al. 2021.

To work around these limitations Srishti took a basin-scale view of the CALIPSO data and looked for large scale trends that would link CMOD with chlorophyll a or ice cover. This approach isn’t idea and glosses over a lot of interesting details, but it is nonetheless sufficient to reveal some interesting relationships. Most notably that MOAD and chlorophyll a are weakly but significantly correlated in a time-lagged fashion, with a delay of approximately 1 month yielded the strongest correlation. This makes sense, as the volatile organics compounds that link phytoplankton (and ice algal) communities to MAOD are thought to be maximally produced near the end of the phytoplankton bloom as the biomass starts to decay. In the near future new satellite missions like PACE and improved land/sea observing campaigns will allows us to get into the details a bit more, including direct observations of specific blooms and the time- and space-lagged MOAD response!

The strength and sign of the correlation between MAOD and sea ice cover, wind speed, and chlorophyll a change as a function of the time-lag. For chlorophyll a, the strongest correlation with MAOD is observed with a 1-month lag. We hypothesize that this corresponds to the decay of a phytoplankton bloom when we expect the emissions of volatile organic carbon compounds to be maximal.
Posted in Research | 1 Comment

Sampling mangroves in Florida’s Indian River Lagoon

Last week PhD student Natalia Erazo and I were fortunate to get back into the field after a long pandemic hiatus.  Our mission was to collect mangrove propagules (essentially a detachable bud from which the mangrove seedling sprouts) from the Indian River Lagoon in Florida for an upcoming experiment on mangrove-microbe symbiosis.  Neither of us had worked in Florida before so we teamed up with Candy Feller, an emeritus scientist with the Smithsonian Marine Station in Ft. Pierce, FL.  Candy has been working on mangroves in Florida and around the world for decades and is extremely knowledgeable about the ecology of these systems.  She and husband Ray Feller allowed us to tag along as they checked on a few long-term experiments and study sites up and down the coast.

Natalia, Candy, and I standing in a mixed salt marsh-mangrove habitat near the northern limit of the mangrove range. Photo: Ray Feller.

For those not familiar with Florida’s Atlantic coast, the Indian River Lagoon is a network of estuaries and barrier islands that stretch from north of Cape Canaveral to south of Port St. Lucie.  The barrier islands form a protected waterway that provides habitat for mangroves, manatees, and a variety of other species.  The Indian River Lagoon is home to quite a few people as well, and there are some issues associated with water quality. Nutrients from septic and sewage systems are cited as a cause of high phytoplankton loads and increasingly murky water, leading to a reduction in aquatic vegetation and increased manatee mortality.  Key landscape features in the Lagoon are also the result of human habitation.  For example, much of the mangrove habitat in the Ft. Pierce region exists within engineered mosquito abatement areas.  To reduce the number of mosquitos (currently a nuisance, but previously some did carry disease) berms were created around vast tracts of mangrove habitat.  These areas were then flooded, reducing the breeding success of mosquitos because they lay eggs on wet but not flooded soil. 

Natalia samples propagules from mangroves of the genus Avicennia in a former mosquito abatement area.

Unfortunately, mosquito abatement also killed the mangrove trees which, while salt tolerant and adapted to life in saturated soils, require tidal action to oxygenate the water.  Modern mosquito abatement efforts (while still energy and labor intensive) take this into account and mangroves are thriving in areas that were formerly stagnant abatement ponds.  This is a Good Thing for anyone who likes fish, crabs, shoreline stabilization, and any of the other services that mangroves are well known for providing.

A particularly interesting feature of the Indian River Lagoon is that it is oriented north to south at nearly the northernmost known extent of mangroves on the US Atlantic Coast.  This provides an excellent opportunity to study how mangroves are responding to changing climate.  It’s known that mangroves are extending their range to the north, but climate change is anything but linear, and the rise in atmospheric and sea surface temperatures are accompanied by instabilities and severe perturbations.  The most notable may be freezing events caused by deep intrusions of the now infamous polar vortex.  Such perturbations can have a bigger impact on landscape ecology than the background climate.  Mangroves are very much a tropical species but somewhat resistant to transient freeze events (at least more so than your average Florida orange tree).  How they respond physiologically to these and other stressors that they encounter in their northward progression remains to be seen.

Mangrove trees near the southern end of the Indian River Lagoon. There are no salt marsh habitats in the region, mangrove forests dominate the estuaries.
Salt marsh (with pulp mill in the background) at Fernandina Beach, well north of the current known mangrove range in Florida. Eventually this salt marsh will convert to mangrove forest similar to the previous picture, but the timeline on which this will occur is anyone’s guess.
Posted in Research | Leave a comment

New paper on microbial community structure in coastal Southern California

Congrats to postdoctoral researcher Jesse Wilson for his new paper in Environmental Microbiology Recurrent microbial community types driven by nearshore and seasonal processes in coastal Southern California. Although considerable microbiology work has taken place at the Ellen Browning Scripps Pier this is (surprisingly) the first study to comprehensively look at how bacterial and archaeal community structure change over time. This is also the first of what we hope to be many publications that are a product of the Scripps Ecological Observatory.

Jesse Wilson (left), Avishek Dutta (right), and I prep an in situ sampling pump for the Scripps Ecological Observatory.

As part of the Scripps Ecological Observatory effort we team up with the Southern California Coastal Ocean Observing System (SCCOOS) team for twice-weekly sampling of surface water for microbial community structure via 16S and 18S rRNA gene sequencing and microbial abundance via flow cytometry. As you can see from the SCCOOS and flow cytometry data below it’s a pretty dynamic system! This is why the site is so advantageous for ecological studies; more dynamic means more opportunities to identify co-variants in the environment that signal possible interactions.

From Wilson et al., 2021. Key ecological parameters and flow cytometry data for the Ellen Browning Scripps Pier for an ~18 month period.

At the core of Jesse’s paper is the 16S rRNA gene sequence dataset. What these data provide is a high resolution view of the taxonomy of the bacterial and archaeal community at each sample point. These data are so high resolution – after proper denoising and quality control they represent hundreds to thousands of unique taxa – that it’s often difficult to make inferences from them. Techniques are applied to reduce the complexity of the data and make it easier to see patterns.

From Wilson et al., 2021. Two different techniques were applied to the 16S rRNA gene dataset to reduce the complexity of the microbial community and allow patterns to emerge. The panel at the top shows the occurrence of taxonomic “modes” (our term for SOM-derived classes). The panel at the bottom shows the occurrence of subnetworks in a WGCNA analysis.

Jesse approached the problem from perspectives of both the observations (sampling days) and variables (microbial taxa). For microbial time-series data it is much more common to aggregate variables. A widely used approach involves a technique known as weighted gene correlation network analysis (WGCNA), originally developed for gene expression studies. WGCNA uses network analysis to combine taxa into subnetworks or modules that have like co-occurrence patterns. One advantage to this approach is that the subnetworks are easily correlated to external variables that either drive the pattern (e.g., physical processes) or are influenced by it (e.g., ecophysiology). A disadvantage is that these correlations aren’t predictive. You can’t readily classify new data into the existing subnetworks, and the co-occurrence patterns of the subnetworks themselves contain additional information that isn’t readily captured by this approach.

In a 2017 paper we demonstrated how self-organizing maps (SOMs) can be used to more explicitly link environmental parameters with microbial community structure. SOMs are a form of neural network and collapse complex, multi-dimensional data into a 2D representation that retains the major relationships present in the original data. The end result of the SOM training process is a 2D model of the data that can be further subdivided into distinct classes. Applied to community structure data (i.e. in microbial community segmentation) the SOM flips the aggregation problem, aggregating samples instead of taxa. That means that each unique sample point can be described by the model as a single discrete variable that nonetheless captures much of the key information present. A major advantage to this approach is that the model is reusable: new data can be very efficiently assigned to existing classes, which is a key advantage for an ongoing ecological monitoring effort.

Results of a “microbial community segmentation” using SOMs. A graphical representation of the model is shown in A. B-E show the association of the microbial modes with different ecological parameters.

This paper is an exciting but very early effort to track microbial processes at the Scripps Ecological Observatory. The time-series presented here ends in June of 2019 – the date our original (and terrible) flow cytometer terminally failed – but twice-weekly data collection have continued. We now have three years of 16S and 18S rRNA gene sequence and flow cytometry data and this collection will continue as long as we’re able to support it! Students and potential postdocs interested in microbial time-series analysis should take note…

Many thanks to the Simons Foundation Early Career Investigator in Marine Microbial Ecology and Evolution program for supporting this work, and to all the SCCOOS technicians and Bowman Lab personnel for bringing us water and processing samples!

Posted in Research, Scripps Ecological Observatory | Leave a comment

New paper on microbial life in hypersaline environments

Congrats to Benjamin Klempay for his first, first-authored publication in the lab! (wow, didn’t I just write that??) Benjamin is part of the Oceans Across Space and Time (OAST) project and his paper, Microbial diversity and activity in Southern California salter and bitterns: analogues for ancient ocean worlds, appears in a special issue of the journal Environmental Microbiology. In the paper Benjamin does a deep dive into the microbial diversity of the network of lakes that make up the South Bay Salt Works, a little known industrial site/wildlife refuge on San Diego Bay that also happens to be the oldest continually operating solar salt harvesting facility in the US.

OAST team members Maggie Weng, Benjamin Klempay, and Peter Doran at the SBSW in 2020.

Our interest in hypersaline lakes – aside from that fact that they just really weird and fun environments to explore – is their value as analogues for evaporative environments on Mars and other ancient ocean worlds. Once upon a time Mars was wet, and may not have been so dissimilar to many environments on Earth today. As that water was lost the oceans, lakes, and wetlands were reduced by evaporation to saline lakes and ultimately salt pans. These end-state evaporative environments are key targets for Martian exploration today. Extremely salty lakes like those found at the Salt Works are a reasonable representation of the last potentially inhabited environments on the surface of Mars before it became too desiccated to support life. Thus the signatures of ancient Martian life might bear some similarities to contemporary life in these lakes.

From https://www.nasa.gov/press/2015/march/nasa-research-suggests-mars-once-had-more-water-than-earth-s-arctic-ocean. Mars was once a wet world. As it dried the remnant lakes and oceans would have become increasingly saline, eventually representing hypersaline environments like the lakes of the South Bay Salt Works.

The microbial diversity of hypersaline lakes has been studied in depth – as I mentioned before they’re weird and fun places to study – but Benjamin’s work looks at a couple of unexplored elements. First, he didn’t restrict his analysis to sodium chloride lakes at the Salt Works (salterns) but also included magnesium chloride lakes (bitterns) that are thought to be too toxic for life (see a nice discussion of this in a recent OAST paper here). He found an interesting pattern of microbial diversity across these lakes, with diversity decreasing as salinity decreases, then suddenly increasing in the magnesium chloride lakes. The reason for this is the absence of microbial growth in those lakes. Rather than hosting a specialized microbial community they collect microbes from dust, seaspray, and other sources (infall), and preserve this DNA but inactivating the enzymes that would normally degrade it.

Microbial diversity in salterns and bitterns. Diversity increases below the known water activity limit for bacteria and archaea due to external inputs of new genetic material. From Klempay et al. 2021.

Co-authors Anne Dekas and Nestor Arandia-Gorostidi at Stanford also applied nano-SIMS to evaluate single-cell activity levels across the salinity (water activity) gradient. Biomass can be very high in these lakes – 100 fold or more higher than seawater – so we assumed that activity would be high too. The nice thing about nano-SIMS is that it evaluates activity on a per-cell basis. Looked at in this way, most bacteria and archaea had surprisingly low levels of activity. We’re still trying to understand exactly what this means and Anne and Nestor undertook an impressive array of experiments as part of our 2020 field effort to try to get to the bottom of it. We think that the extraordinarily low levels of predation are partially responsible; the eukaryotic protists that typically prey on bacteria and archaea can’t grow at the salinity of the saltiest lakes at South Bay Salt Works. Viruses, the other major source of mortality for bacteria and archaea, don’t generally propagate through low-activity populations. So the haloarchaea that dominate in these lakes may have hit upon a winning evolutionary strategy of slow growth under the protection of a particularly extreme environment.

Single-cell activities as measured by nano-SIMS. From Klempay et al. 2021.
Posted in OAST, Research | Tagged , , | Leave a comment

New paper on shrimp aquaculture in mangrove forests

Congrats to Natalia Erazo for her first first-authored publication in the lab! Her paper, Sensitivity of the mangrove-estuarine microbial community to aquaculture effluent, appears in a special issue of the journal iScience. The publication is the culmination of our 2017 field effort in the Cayapas-Mataje and Muisne regions of Ecuador.

Study sites in Cayapas-Mataje and Muisne, Ecuador. From Erazo and Bowman, 2021.

Ecuador is ground zero for mangrove deforestation for shrimp aquaculture. Most of Ecuador’s coastline is in fact completely stripped of mangroves. The biogeochemical consequences of this aren’t hard to imagine. Mangrove forests contain a significant amount of carbon in living biomass and in the sediment. Aquaculture ponds, by contrast, contain a large amount of nitrogen as a result of copious additions of nitrogen-rich shrimp feed. The balance of C to N is one of the fundamental stoichiometric relationships in aquatic chemistry. When it shifts all kinds of interesting things start to happen.

Shrimp aquaculture ponds in Muisne, Ecuador. Once there were mangroves…

The one place in Ecuador where you can find large areas of mangroves is the Cayapas-Mataje Ecological Reserve. CMER is in fact the largest contiguous mangrove forest on the Pacific coast of Latin America. Its status comes from an interesting combination of social and economic factors that left this part of Ecuador relatively undeveloped until recently. There is shrimp aquaculture in the reserve, but it’s nowhere near as expansive as in Muisne and other ex-mangrove sites in Ecuador.

Natalia leveraged the different levels of disturbance present in Cayapas-Mataje, and between Cayapas-Mataje and Muisne, to explore what the impact of all this aquaculture activity is on microbial community structure. After all it’s really the microbial community that responds to and drives the biogeochemistry, so understanding the sensitivity of these communities to the changing conditions gives us insight into how the system is changing as a whole.

Patterns in biogeochemistry and genomic features across the disturbance gradient in this study. Erazo and Bowman, 2021.

By using our paprica pipeline Natalia was able to evaluate changes in microbial community structure, predicted genomic content, and key genome features across the disturbance gradient. A nitrogen excess (relative to phosphorous) was associated with bacteria with larger genomes and more 16S rRNA gene copies, indicative of a more copiotrophic or fast-growing population. This has implications for how carbon is turned over or retained at the higher levels of disturbance.

Distribution of predicted metabolic pathways related to nitrogen cycling across different levels of disturbance. Erazo and Bowman, 2021.

Different microbial metabolisms are also associated with the level of disturbance. The figure above shows the distribution of predicted metabolic pathways associated with nitrogen metabolism. Nitrogen fixation, a feature of microbial symbionts of many plants, is less abundant at high levels of disturbance, while pathways associated with denitrification are more abundant. The interesting thing about this is that these samples are restricted to the mangroves themselves – the high disturbance samples don’t reflect the actual aquaculture ponds – so these changes reflect altered processes in the remaining stands of mangroves. The loss of beneficial, symbiotic bacteria and elevated abundance of putative shellfish pathogens suggests the impacts of aquaculture are not limited to the physical removal of mangrove trees and associated release of carbon.

Posted in Research | Tagged , | Leave a comment

A short tutorial on Gnu Parallel

This post comes form Luke Piszkin, an undergraduate researcher in the Bowman Lab. Gnu Parallel is a must-have utility for anyone that spends a lot of time in Linux Land, and Luke recently had to gain some Gnu Parallel fluency for his project. Enjoy!

*******

GNU parallel is a Linux shell tool for executing jobs in parallel using multiple CPU cores. This is a quick tutorial for increasing your workflow and getting the most out of your machine with parallel. ​ You can find the current distribution here: https://www.gnu.org/software/parallel/. Please try some basic commands to make sure it is working. ​ You will need some basic understanding of “piping” in the command line. I will describe command pipes briefly just for our purposes, but for a more detailed look please see https://www.howtogeek.com/438882/how-to-use-pipes-on-linux/. ​ Piping data in the command line involves taking the output of one command and using it as the input for another. A basic example looks like this: ​

command_1 | command_2 | command_3 | … 

​ Where the output of command_1 will be used as an input by command_2, command_2 will be used by command_3, and so on. For now, we will only need to use one pipe with parallel. Now let’s look at a basic command run in parallel. ​

Input: find -type f -name "*.txt" | parallel cat
Output: 
The house stood on a slight rise just on the edge of the village.
It stood on its own and looked over a broad spread of West Country farmland.
Not a remarkable house by any means - it was about thirty years old, squattist, squarish, made of brick, and had four windows set in the front size and proportion which more or less exactly failed to please the eye
The only person for whom the house was in any way special was Arthur Dent, and that was only because it happened to be the one he lived in.
He had lived in it for about three years, ever since he had moved out of London because it made him nervous and irritable

​ This command makes use of find to list all the .txt files in my directory, then runs cat on them in parallel, which shows the contents of each file on a new line. We can already see how this is much easier than running each command separately, i.e:

In: cat file1.txt
The house stood on a slight rise just on the edge of the village.
In: cat file2.txt
It stood on its own and looked over a broad spread of West Country farmland.

​ Also, notice how we do not need any placeholder for the files in the second command, because of the pipes. Now let’s take a more complicated example:

find -type f -name "*beta_gal_vibrio_vulnificus_1_100000_0__H_flex=up_*.txt" ! -name "*tally*" | parallel -j 4 python3 PEPCplots.py {} flex log
0.001759374417007663, 0.00033497120199255527, 0.9969940359705531
0.0019773468515624356, 0.00022978867370935437, 0.9969940359705531
0.001332602651915014, 0.0005953339816183529, 0.9969940359705531
0.0015118302435556904, 0.0005040931537659636, 0.9969940359705531
0.001320879258211107, 0.0006907926578169569, 0.9969940359705531
0.0016753759966792244, 0.00041583739269117386, 0.9969940359705302
0.0017187095827331082, 0.00036931151058880094, 0.9969940359705531
0.0017045099726521733, 0.00031386214441070197, 0.9969940359705531
0.001399703145023273, 0.0005196629341168314, 0.9969940359705531
0.001436129272321403, 0.0004806654291442482, 0.9969940359705531

​ This is an example from my research, it takes in a .txt data file and spits out some parameters that I want to put in a spreadsheet. Like before, we use find to get a list of all the files we want the second command to process. We use ! -name “*tally*” to exclude any files that have “tally” anywhere in the name because we don’t want to process those. In the second command, we have the option -j 4. This tells parallel to use 4 CPU cores, so it can run 4 commands at a time. You can check your computer specs to see how many cores you have available. If your machine has hyper-threading, then it can create virtual cores to run jobs on too. For instance, my dinky laptop only has 2 cores, but with hyper-threading I can use 4. This is another way to improve your efficiency. In the second command you also see a {} placeholder. This spot is filled by whatever the first command outputs. In this case, we need that placeholder because our input files go between other commands. You can also use parallel to run a number of identical commands at the same time. This is helpful if you have a program to run on the same file multiple times. For example:

seq 10 | parallel -N0 cat file1.txt
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.

​ Here we use seq as a counting mechanism for how many times to run the second command. You can adjust the number of jobs by changing the seq argument. We include the -N0 flag, which tells parallel to ignore any piped inputs because we aren’t using the first command for inputs this time. Often, I like to include both the time shell tool and the –progress parallel option to see current job status and time for completion: ​

seq 10 | time parallel --progress -N0 cat file1.txt
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
​
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:4/0/100%/0.0s The house stood on a slight rise just on the edge of the village.
local:4/1/100%/1.0s The house stood on a slight rise just on the edge of the village.
local:4/2/100%/0.5s The house stood on a slight rise just on the edge of the village.
local:4/3/100%/0.3s The house stood on a slight rise just on the edge of the village.
local:4/4/100%/0.2s The house stood on a slight rise just on the edge of the village.
local:4/5/100%/0.2s The house stood on a slight rise just on the edge of the village.
local:4/6/100%/0.2s The house stood on a slight rise just on the edge of the village.
local:3/7/100%/0.1s The house stood on a slight rise just on the edge of the village.
local:2/8/100%/0.1s The house stood on a slight rise just on the edge of the village.
local:1/9/100%/0.1s The house stood on a slight rise just on the edge of the village.
local:0/10/100%/0.1s
0.21user 0.46system 0:00.63elapsed 108%CPU (0avgtext+0avgdata 15636maxresident)k
0inputs+0outputs (0major+12089minor)pagefaults 0swaps

​ And with that, you are well on your way to significantly increasing your computing throughput and using the full potential of your machine. You should now have a sufficient understanding of parallel to construct a command for your own projects, and to explore more complicated applications of parallelization. (Bonus points to whoever knows the book that I used for the text files.)

Posted in Computer tutorials | 1 Comment

New paper on microbial community dynamics in up-flow bioreactors

Congrats to Avishek Dutta for his first publication in the Bowman Lab: Understanding Microbial Community Dynamics in Up-Flow Bioreactors to Improve Mitigation Strategies for Oil Souring. Avishek did a remarkable job of resurrecting a stagnant dataset and turning it into a compelling story.

What is oil souring anyway? When oil is extracted in a production field the pressure of the field drops over time. To keep the pressure up the oil company (or more accurately, the subsidiary of the subsidiary tasked with such things) pumps in water, which is frequently seawater. When the water comes back out through wells there are two options. The most economical thing is to release it back into the environment, which has obvious negative consequences for environmental health. Alternatively, the same water can be reused by pumping it back into the ground. The downside to this is that recycled well water typically induces the production of hydrogen sulfide by sulfate reducing bacteria. The hydrogen sulfide reacts with the oil (“souring” it) and creates its own set of environmental and occupational hazards.

The oil industry has spent quite a bit of effort trying to figure out how to mitigate the process of oil souring. A leading method is to introduce nitrate salts into the system to boost the growth of nitrate reducing bacteria. The nitrate reducers out compete the sulfate reducers for reduced carbon (such as volatile fatty acids) and induce other processes that further impeded sulfate reduction.

Although the basic concept is pretty simple the details of this competition between nitrate and sulfate reducers in oil field aquifers is not well understood. In this study Avishek leveraged samples from up-flow bioreactors, analogs of the oil field aquifer system, for 16S rRNA gene analysis of the bacterial and archaeal communities. The bioreactors are vessels filled with sand, seawater, and sources of bioavailable carbon (the oil itself is a source of carbon, but requires a specialized microbial community to degrade). Some of the bioreactors also contain oil. Water flows continually through the system and nitrate salts can be added at appropriate time-points. For this experiment the nitrate amendment (the mitigation or M phase) was halted (the rebound sulfidogenesis or SG phase) and then restarted (the rebound control or RC phase).

From Dutta et al., 2020. Relative abundance of top taxa in during different phases: mitigation (M), rebound control (RC), rebound sulfidogenesis (RS).

Lots of interesting things emerged from this relatively small-scale experiment. For one thing the oil and seawater samples are not that different from one another during mitigation. However, when the nitrate addition is stopped those two treatments start to diverge, with different sulfate reducing taxa present in each. This divergence (but not necessarily the microbial community) persists after the treatment ends. But not all microbial taxa responded to the rather extreme perturbation caused by the nitrate addition. Desulfobacula toluolica, for example, which should have been out-competed during mitigation, remained a significant member of the community.

We’re currently analyzing the results of a much larger bioreactor study that we expect will shed some new light on these processes, so stay tuned!

Posted in Research | Leave a comment

New paper linking the SCCOOS and AGAGE datasets

Postdoctoral researcher Jesse Wilson has a new paper titled Using empirical dynamic modeling to assess relationships between atmospheric trace gases and eukaryotic phytoplankton populations in coastal Southern California in press in the journal Marine Chemistry. This paper is the culmination of a nearly two year effort to bring together two long-term datasets collected at the Ellen Browning Scripps Memorial Pier: the Southern California Coastal Ocean Observing System (SCCOOS) phytoplankton count and the Advanced Global Atmospheric trace Gas Experiment (AGAGE). Both of these programs encompass many more sites than the Scripps Pier, but that’s the happy point of overlap. The SCCOOS phytoplankton count (augmented by the McGowan chlorophyll time-series) is part of an effort to track potential HAB-forming phytoplankton in Southern California. Twice weekly microscope counts are made of key phytoplankton taxa, and weekly measurements are made for chlorophyll a and nutrients. AGAGE is, as the name suggests, a global effort to monitor changes in atmospheric trace gases. They do this using high frequency measurements of key gases with GC-MS and a cryo-concentration system known as Medusa.

Our study was motivated by the need to better understand the contribution of different phytoplankton taxa to atmospheric trace gases. Many phytoplankton (and macroalgae) produce volatile organic compounds (e.g., DMS and isoprene) and other trace gases (e.g., carbonyl sulfide). Some of these gases have interesting functions in the atmosphere, such as the formation of secondary aerosols. Although there are many laboratory studies looking at trace gas production by phytoplankton in culture, environmental studies on this topic are usually limited in space and time by the duration of a single cruise or field campaign.

From Wilson et al., 2020. Temporal patterns for various trace gases. Note the strong seasonality for carbonyl sulfide, iodomethane, and chloromethane. Bromoform and dibromomethane are also seasonal, but exhibited a provocative spike in late 2014.

The temporal and spatial limitation of field campaigns is what makes long-term time-series efforts so valuable. For this study we had 9 years of overlapping data between SCCOOS and AGAGE. To analyze these data Jesse designed an approach based on Empirical Dynamic Modeling (EDM) and Convergent Cross Mapping (CCM). For good measure he also aggregated the available meteorological data using a self-organizing map (SOM). EDM and CCM are emerging techniques that can identify causal relationships between variables. The basic idea behind EDM is that a time-series can be described by its own time-lagged components. Given two time-series (say a trace gas and phytoplankton taxa), if the time-lagged components of one describe the other this is evidence of a causal relationship between the two. For a more in-depth treatment of EDM and CCM see this excellent tutorial on Hao Ye’s website.

Not surprisingly our all-vs-all approach to these datasets was a bit messy. A lot of this is due to the complexity of the natural environment and the spatial and temporal disconnect between the measurements. The phytoplankton counts are hyper-local, and reflect the very patchy nature of the marine environment, while the trace gas measurements are regional at best, as the atmosphere moves and mixes over great distances in only a few hours. Nonetheless we made the assumption that ecological observations at the pier are some reflection of conditions across a wider area, and that trace gas measurements do reflect some local influence. So there should be observable links between the two even if those links are muted.

From Wilson et al., 2020. Depth of color gives the value for rho, a measure of the cross-map ability when a parameter (row) affects a trace gas (column). Only significant values are shown, * indicates that the there is evidence of causal interaction in both directions, which typically indicates interaction with a third, unmeasured variable.

I’m particularly excited about what we can do with these data in the future, when we have several years of molecular data. As part of the Scripps Ecological Observatory initiative we’re sequencing 16S and 18S rRNA genes from the same samples as the SCCOOS microscopy counts. We have around 2.5 years of data so far. Just a few more years until we have a molecular dataset as extensive as the count data analyzed here! The key difference will be in the breadth of that data, which will allow us to identify an order of magnitude more phytoplankton taxa than are counted.

Posted in Research | Leave a comment

Finding those lost data files

It’s been a long time since I’ve had the bandwidth to write up a code snippet here. This morning I had not quite enough time between Zoom meetings to tackle something more involved, so here goes!

In this case I needed to find ~200 sequence (fasta) files for a student in my lab. They were split across several sequencing runs, and for various logistical reasons it was getting a bit tedious to find the location of each sequence file. To solve the problem I wrote a short Python script to wrap the Linux locate command and copy all the files to a new directory where they could be exported.

First, I created a text file “files2find.txt” with text uniquely matching each file that I needed to find. One of the great things about locate is that it doesn’t need to match the full file name.

head files2find.txt
151117_PAL_Sterivex_1
151126_PAL_Sterivex_2
151202_PAL_Sterivex_3
151213_PAL_Sterivex_4
151225_PAL_Sterivex_5
151230_PAL_Sterivex_6
160106_PAL_Sterivex_7
160118_PAL_Sterivex_9
160120_PAL_Sterivex_10
160128_PAL_Sterivex_11

Then the wrapper:

import subprocess
import shutil

with open('files2find.txt') as file_in:
    for line in file_in:
        line = line.rstrip()

        ## Here we use the subprocess module to run the locate command, capturing
        ## standard out.

        temp = subprocess.Popen('locate ' + line,
                                shell = True,
                                executable = '/bin/bash',
                                stdout = subprocess.PIPE)

        ## The communicate method for object temp returns a tuple.  First object
        ## in the tuple is standard out.       
        
        locations = temp.communicate()[0]
        locations = locations.decode().split('\n')

        ## Thank you internet for this one-liner, Python one-liners always throw
        ## me for a loop (no pun intended). Here we search all items in the locations
        ## list for a specific suffix that identifies files that we actually want.
        ## In this case our final analysis files contain "exp.fasta".  Of course if
        ## you're certain of the full file name you could just use locate on that and
        ## omit this step.

        fastas = [i for i in locations if 'exp.fasta' in i] 
        
        path = '/path/to/where/you/want/files/'
        
        found = set()

        ## Use the shutil library to copy found files to a new directory "path".
        ## Copied files are added to the set "found" to avoid being copied more than
        ## once, if they exist in multiple locations on your computer.
        
        for fasta in fastas:
            file_name = fasta.split('/')[-1]
            if file_name not in found:
                shutil.copyfile(fasta, path + file_name) 
                found.add(file_name)

        ## In the event that no files are found report that here.
                
        if len(fastas) == 0:
            print(line, 'not found')
Posted in Computer tutorials | Leave a comment