New paper on using machine learning to predict biogeochemistry from microbial community structure

Congratulations to Avishek Dutta for his paper Machine Learning Predicts Biogeochemistry from Microbial Community Structure in a Complex Model System that was recently published in the journal Microbiology Spectrum. I’m really excited about this paper; the study it is based on inspired this perspective that I wrote for an mSystems early career special issue last year.

Summary of experimental design and analysis, from Dutta et al., 2022.

The figure above summarizes the experimental design and analysis. The experiment was designed to address the question of whether the microbial community contains sufficient information to predict a biogeochemical state in a dynamic system. The structure of a microbial community is highly sensitive to environmental change. Small changes in the chemical or physical environment will result in a shift in abundance of one or more taxa as mortality and growth rates respond. These shifts in structure are easily observed by amplicon sequencing of taxonomic marker genes. These relative abundance data can be combined with flow cytometry analysis of microbial abundance to yield absolute abundance data.

The trick of course is relating an observed shift in community structure to a specific biogeochemical state. Machine learning provides a number of ways to do this, but all require large training datasets. Fortunately gene sequencing is pretty cheap these days and DNA extractions are much more high-throughput than they were just a few years ago. Because of this it’s possibly to generate community structure data for hundreds of samples in relatively short order. In this study Avishek used over 700 samples from sediment bioreactors and the random forest algorithm to predict the concentration of hydrogen sulfide with a reasonably high degree of accuracy.

Like any statistical model, developing machine learning models takes careful attention to detail. Careful segregation of the data into training and validation sets and engineering of the features used for prediction yield the most honest models that can be best applied for future predictions. Avishek’s paper is an excellent template for developing a predictive machine learning model from microbial community structure data.

24972 Total Views 2 Views Today
This entry was posted in Research and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

WordPress Anti Spam by WP-SpamShield