The Bowman Lab has a new position open for a postdoc in viral pathogen ecology. The postdoc will join a dynamic team of experimentalists, ecological modelers, and physical oceanographers working to understand the distribution of and exposure risk to norovirus and other pathogens in coastal California. The work is motivated by urgent need to better forecast risk associated with cross-border sewage transport in southern San Diego County. The postdoc will have primary responsibility for conducting experiments to determine the decay of norovirus under realistic environmental conditions and for analyzing a growing dataset of norovirus abundance obtained with ddPCR. There will be opportunities to be involved in both the field and modeling components of the project, depending on interest and professional development goals. Applicants should have a strong publication record and excellent writing skills, knowledge of experimental design, qPCR experience, and theoretical knowledge of pathogen ecology. Specific knowledge of norovirus and mammalian cell culture techniques is a plus. Position will remain open until filled. Please send an expression of interest and CV to Jeff at jsbowman at ucsd.edu.
Seeking postdoc in phytoplankton ecology
The Bowman Lab seeks a postdoctoral researcher for a two-year project to investigate patterns in phytoplankton community composition across molecular time series collected at the Ellen Browning Scripps Memorial Pier at Scripps Institution of Oceanography (SIO) and the Cal Polytechnic State University (Cal Poly, San Louis Obispo) Pier at Avila Beach. The postdoctoral researcher should have a background in biological oceanography or microbial ecology, a solid understanding of ecological statistics and the analysis of amplicon sequence data, be familiar with phytoplankton sampling techniques, and understand basic molecular lab procedures. Specific knowledge of eukaryotic harmful algal bloom-forming taxa is a plus. The position will be located at SIO but jointly mentored by Dr. Alexis Pasulka (Cal Poly) and Dr. Jeff Bowman (SIO). There will be specific opportunities to work with undergraduate students at Cal Poly and SIO. The position will be open until filled and applicants should reach out jointly to Dr. Bowman (jsbowman at ucsd.edu) and Dr. Pasulka (apasulka at calpoly.edu) with a copy of their CV and a brief expression of interest.
Recent blog post by PhD student Beth Connors
Check out this recent blog post written by PhD student Beth Connors for International Women in Science Day!
New postdoctoral research opportunity!
Andrew Barton (https://adbarton.scrippsprofiles.ucsd.edu/) and Jeff Bowman (https://jsbowman.scrippsprofiles.ucsd.edu/) at Scripps Institution of Oceanography at the University of California San Diego are recruiting a postdoc to study interactions among marine microbes, inferred from regular genomic measurements and cell images taken at Scripps Pier (https://ecoobs.ucsd.edu/). Possible research areas include but are not limited to: quantifying the strength and direction of microbial interactions, identifying “keystone” microbial taxa, and assessing how microbial interactions shape ecosystem function. The ideal candidate will have a PhD in ecology, marine biology, or related disciplines, and proficiency in data science techniques, machine learning, novel statistical methods, and/or numerical modeling approaches for studying natural populations and communities. Please direct qualified candidates to contact Andrew Barton (adbarton@ucsd.edu) for more information. We anticipate filling the open position by Fall 2023.
Alignment and phylogenetic inference with hmmalign and RAxML-ng
RAxML is one of the most popular programs around for phylogenetic inference via maximum likelihood. Similarly, hmmalign within HMMER 3 is a popular way to align amino acid sequences against HMMs from Pfam or created de novo. Combine the two and you have an excellent method for constructing phylogenetic trees. But gluing the two together isn’t exactly seamless and novice users might be deterred by a couple of unexpected hurdles. Recently, I helped a student develop a workflow which I’m posting here.
First, define some variables just to make the bash commands a bit cleaner. REF refers to the name of the Pfam hmm that we’re aligning against (Bac_rhodopsin.hmm in this case), while QUERY is the sequence file to be aligned (hop and bop gene products, plus a dinoflagellate rhodopsin as outgroup).
REF=Bac_rhodopsin
QUERY=uniprot_hop_bop_reviewed
Now, align and convert the alignment to fasta format (required by RAxML-ng).
hmmalign --amino -o $QUERY.sto $REF.hmm $QUERY.fasta
seqmagick convert $QUERY.sto $QUERY.align.fasta
Test which model is best for these data. Here we get LG+G4+F
.
modeltest-ng -i $QUERY.align.fasta -d aa -p 8
Check your alignment!
raxml-ng --check --msa $QUERY.align.fasta --model LG+G4+F --prefix $QUERY
Oooh… I bet it failed. Exciting! In this case (using sequences from Uniprot) the long sequence descriptions are incompatible with RAxML-ng. Let’s do a little Python to clean that up.
from Bio import SeqIO
with open('uniprot_hop_bop_reviewed.align.clean.fasta', 'w') as clean_fasta:
for record in SeqIO.parse('uniprot_hop_bop_reviewed.align.fasta', 'fasta'):
record.description = ''
SeqIO.write(record, clean_fasta, 'fasta')
Check again…
raxml-ng --check --msa $QUERY.align.clean.fasta --model LG+G4+F --prefix $QUERY
If everything is kosher go ahead and fire up your phylogenetic inference. Here I’ve limited bootstrapping to 100 trees. If you have the time/resources do more.
raxml-ng --all --msa $QUERY.align.clean.fasta --model LG+G4+F --prefix $QUERY --bs-trees 100
Superimpose the bootstrap support values on the best ML tree.
raxml-ng --support --tree $QUERY.raxml.bestTree --bs-trees $QUERY.raxml.bootstraps
And here’s our creation as rendered by Archaeopteryx. Some day I’ll create a tree that is visually appealing, but today is not that day. But you get the point.
New paper on using machine learning to predict biogeochemistry from microbial community structure
Congratulations to Avishek Dutta for his paper Machine Learning Predicts Biogeochemistry from Microbial Community Structure in a Complex Model System that was recently published in the journal Microbiology Spectrum. I’m really excited about this paper; the study it is based on inspired this perspective that I wrote for an mSystems early career special issue last year.
The figure above summarizes the experimental design and analysis. The experiment was designed to address the question of whether the microbial community contains sufficient information to predict a biogeochemical state in a dynamic system. The structure of a microbial community is highly sensitive to environmental change. Small changes in the chemical or physical environment will result in a shift in abundance of one or more taxa as mortality and growth rates respond. These shifts in structure are easily observed by amplicon sequencing of taxonomic marker genes. These relative abundance data can be combined with flow cytometry analysis of microbial abundance to yield absolute abundance data.
The trick of course is relating an observed shift in community structure to a specific biogeochemical state. Machine learning provides a number of ways to do this, but all require large training datasets. Fortunately gene sequencing is pretty cheap these days and DNA extractions are much more high-throughput than they were just a few years ago. Because of this it’s possibly to generate community structure data for hundreds of samples in relatively short order. In this study Avishek used over 700 samples from sediment bioreactors and the random forest algorithm to predict the concentration of hydrogen sulfide with a reasonably high degree of accuracy.
Like any statistical model, developing machine learning models takes careful attention to detail. Careful segregation of the data into training and validation sets and engineering of the features used for prediction yield the most honest models that can be best applied for future predictions. Avishek’s paper is an excellent template for developing a predictive machine learning model from microbial community structure data.
Lab manager position open!
We’re on the hunt for a lab manager/senior lab technician to take on a variety of key tasks in the Bowman Lab. The position is being advertised at the Staff Research Associate II level and the ideal applicant will have an MS in a relevant field, or a BS and equivalent experience. We are looking for someone with complementary skills to the rest of the lab; the ideal applicant would have a background in environmental or analytical chemistry to complement our core expertise in microbiology. However, a background in the life sciences also works fine. The formal job posting is pasted below (note that it deviates slightly from what’s described here due to limitations of the UC San Diego HR system).
DESCRIPTION
Under supervision, independently perform a variety of standard laboratory and data analysis procedures (and some non-standard procedures) related to the function of coastal ocean environments. Coordinates and conducts instrument calibrations and data collection for long-term time-series of microbial community structure, microbial abundance, and dissolved gases. Responsible for the operation and maintenance of a membrane inlet mass spectrometer, flow cytometer, and in situ imaging flow cytometer (IFCB), DNA extraction, data entry, and light programming in Python and R. Travel to field stations as needed, which may involve driving University vehicles and operating small boats for diving and coastal field work. Scuba dive to clean and service underwater instrumentation. Coordinate and communicate with lab members about supplies, data and sampling techniques. Oversee and work-direct undergraduate research assistants. Process, analyze, and interpret results from data sets, evaluate quality of data, generate and update design and method documentation, and update web pages. Perform general office duties including but not limited to filing, photocopying, faxing and library searches for research articles. Manage laboratory space, computers, and equipment.
- Must be able to lift 50 lbs.
QUALIFICATIONS
- B.S. in Chemistry, Marine Science, Oceanography, or equivalent combination of education and experience with a strong background in data analysis and computer operations.
- Demonstrated experience with diving and ability to acquire or maintain AAUS and SIO scientific diving certification.
- Demonstrated knowledge of mathematics, scientific, and programming principles.
- Demonstrated experience with R, Matlab, or Python programming languages for data analysis and visualization.
- Demonstrated laboratory experience. Demonstrated knowledge and experience with laboratory techniques and instrumentation, specifically flow cytometry and DNA extractions. Demonstrated experience with laboratory safety procedures and calibration techniques.
- Proven ability to work effectively on multiple tasks in parallel, with each requiring a different focus and level of detail and attention. Proven ability to prioritize tasks and solve problems.
- Demonstrated data entry and data analysis experience. Demonstrated experience with spreadsheets and/or databases for data entry, archival and basic data analysis using standard software (e.g., MS Excel, MS Access, Matlab, or other statistical software packages).
- Experience communicating and interacting with a variety of people from the public to governmental agencies, students and volunteers. Ability to effectively communicate instructions and interact using tact and diplomacy with diverse personalities including academic, staff, student and volunteer employees and institutions/organizations.
- Proven ability and experience using PCs, email, internet, general office tools and software.
- Tolerance of repetitive tasks such as data entry and checking, or extended periods in laboratory filtering samples or analyzing seawater samples via flow cytometry.
- Demonstrated ability to find and follow written and oral procedures from standard laboratory resources.
- Must be organized and a self-motivator with the ability to work efficiently while unsupervised.
- Proven ability to document significant results of data analysis in technical notes. Good writing skills. Ability to integrate data products and methodologies from laboratory and field instrumentation into research results for publication purposes.
- Proven ability to communicate with technical and scientific personnel. Ability to instruct and aid research associates and students on the use of software packages and data procedures/protocols.
- Ability to travel for days to weeks for field work and work extended hours as needed.
- Ability to drive University vehicles to field stations. Valid driver’s license.
- Proven ability to work with others under demanding conditions, sometimes for extended periods of time.
SPECIAL CONDITIONS
- Ability to work at sea. Must have demonstrated experience with SCUBA diving and ability to acquire and maintain AAUS and SIO scientific diving certification.
- Must have valid driver’s license and ability to drive University vehicles to field stations.
- Ability to travel for days to weeks for field work and work extended hours as needed.
- This position is subject to a DMV check for driving record. Fluency in Spanish is preferred.
New paper on protein adaptations to high salinity and low temperature
Congratulations to Luke Piszkin (now a PhD student in the Biophysics Department at the University of Notre Dame) for the first paper in the lab to be first-authored by an undergraduate! Luke’s paper is titled Extremophile enzyme optimization for low temperature and high salinity are fundamentally incompatible and appears in the journal Extremophiles. In the paper Luke explores the molecular basis underlying the intriguing observation that there appear to be very few (no?) extreme halophiles that are also extreme psychrophiles, despite the fact that there are many environments on Earth that are both cold and salty.
One of these environments is Deep Lake, Antarctica, which supports a microbial community dominated by the mesophilic archaeon Halorubrum lacusprofundi (optimal growth temperature of 36 °C). That’s rather surprising given that your typical true psychrophile conks out at about 18 °C. Like all haloarchaea, what H. lacusprofundi can do is tolerate high levels of salt, up to 4.5 M NaCl or 262 g L-1. That level of salt tolerance is not seen among the documented true psychrophiles. Why not?
In the manuscript we posit that it comes down to the different amino acid substitutions needed to adapt a protein to high salt or low temperature conditions. High salt proteins typically have low isoelectric points, derived from more acidic amino acids. The practical implication of this is that they have a more negatively charged surface that requires a high concentration of salt for stability. This is a requirement for the “salt-in” strategists that dominate the most saline environments (such as salt crystallizer ponds). These microbes are primarily archaea but include a few bacteria, and deal with the high salinity of their environment by accumulating high intracellular concentrations of the salt KCl. This maintains their osmotic balance while excluding more harmful salts, but requires proteins that are compatible with high concentrations of KCl. By contrast most halotolerant bacteria (including psychrophiles that inhabit moderate salinity environments) are “salt-out” strategists that accumulate organic solutes to maintain osmotic balance. These solutes impose no particular requirements on intracellular proteins.
The trick is that amino acid substitutions that lead to a lower isoelectric point also decrease the flexibility of the protein. Increased flexibility is the key protein adaptation to low temperature. Thus the fundamental incompatibility between optimization to low temperature and high salinity. To test this idea Luke dusted off a model, the Protein Evolution Parameter Calculator (PEPC), that I developed many years ago in the waning days of my PhD. After updating the code from Python 2 to Python 3 and making some other improvements, Luke devised an experiment to “evolve” core haloarchaea orthologous group (tucHOG) proteins from H. lacusprofundi and the related mesophile Halorubrum salinarum. By telling the model to select for increased flexibility or decreased isoelectric point he could identify how improvements in one parameter impacted the other. As expected, likely amino acid substitutions (based on position in the protein and the BLOSUM80 substitution matrix) that increased flexibility also strongly favored an increased isoelectric point.
Tutorial: altering an existing NPZ model
I had the recent pleasure this summer of teaching high school students as a part of a Sally Ride Science Junior Academy. My class was called Polar Microbes, and we discussed adaptations to environments unique to the poles and the importance of microbes to the food webs of the Arctic and Antarctic. One of the things I most wanted to show students was how a simple ecological model could be changed to better fit the polar environment and explicitly include micro-organisms. I was so impressed by how quickly my students were able to understand and change the code underlying the model we used. I wanted to write a quick tutorial to expand that learning to anyone that is intimidated by ecological modeling and wants an easy place to start.
It is valuable to start out with a basic definition: a model is a simple representation of a complex phenomenon. Models are useful because they explicitly describe important mechanisms, which then can be tested against observations. This testing will ultimately demonstrate if your concept of a natural phenomenon was valid or that it needs to be refined. With very little modeling experience myself, I started with an existing model from the excellent textbook “A Practical Guide to Ecological Modeling” by Karline Soetaert and Peter Herman from Springer. If you use R as a coding language, it is a great book to start modeling, as they have many conceptional explanations paired with highly understandable code. All the examples from the book are in the R package ecolMod:
install.packages("ecolMod”)
library(ecolMod)
demo("chap2")
Once you have the package loaded, you can click through the examples to see how to build a simple ecological model, where a forcing function causes flow between state variables. It is easier to understand with the below visual (Fig 2.1 of Soetart and Herman).
In oceanography, a common real-world application of this conceptual type of model is the NPZD, which stands for Nutrient, Phytoplankton, Zooplankton and Detritus. It is important for us to understand the flow of carbon and nitrogen (among other elements!) through both the macroscopic (zooplankton) and microscopic (detritus that is re-mineralized by bacteria) food web. This is one of the simplest ways to mathematically model it.
Along with figures, the authors are kind enough to include the code for the model. In their code, each of the state variables of NPZ or D (the boxes) are mathematically equal to the flows in minus the flows out. Based on the figure above for instance, PHYTO = f1 – f2. In turn, each of the flows are their own mathematical equations with parameters (constants that are experimentally determined). The equation provided for f1 for instance is:
f1 = Nuptake <- maxUptake * PAR/(PAR+ksPAR) * din/(din+ksDIN)
This is because Nuptake
is dependent on solar radiation (PAR) and the amount of nutrients that are available (din
), as well as the parameters maxUptake, ksPAR and ksDIN
which are set as equal to 1/day, 140 muEinst/m2/s and 0.5 mmolN/m3 respectively when we define our parameters later in the model. I encourage you to download the model code and follow how each of the state variable definitions, flows and parameters are connected. Even in a model as simple as this it gets complicated!
Even more exciting are the model solutions, which show a sensible story over two years. As you know from above, the forcing function for the model is PAR (solar radiation), which varies over the season (the sine wave in panel A of the following figure). As PAR increases in the spring, there is a modeled increase in Chlorophyll and Zooplankton (what oceanographers call a “spring bloom”!) and a decrease in DIN.
As I was teaching a class called Polar Microbes, I wanted to change some parts of the model to better reflect a polar environment. Since the model’s forcing function is the seasonal light cycle, I knew it was the first thing that needed to change. The tilt of our rotation axis ensures that our poles have a much more extreme seasonal light cycles, with time in both full darkness and full light.
When you change the model to reflect this planetary fact (just change the PAR function to have a steeper slope and a period of darkness), the output variables change drastically (the Polar Model is in blue below):
Our class had long discussions about this model output. Is it sensible? What can you infer about the polar regions from this? How could it be improved? In our class, we ended up even adding another state variable, Bacteria, and altering the flows from it (viral lysis) to see what happens.
I encourage you to download the ecolMod package and see for yourself! If you are a high school student, consider joining us next summer at Sally Ride Science for my summer class on Polar Microbes as well.