Some thoughts on modeling

I’m not a modeler, but I played one once in grad school. Or at least that’s how I’m feeling at the moment. I’m currently working on the last chapter of my dissertation and it became necessary to explore the mechanisms underlying some empirical observations. This encouraged (forced?) me to undertake my first foray into modeling, developing a simple model of how a protein might evolve given a set of conditions and using it to predict what conditions might be responsible for the observed state. Because of this deviation from my normal observational and experimental work I’ve been thinking a lot about what models actually are and what they’re useful for. Considering the degree to which everyone is exposed to models on a daily basis it’s troubling how little people, including most scientists, think about them.

Currently scientific modeling is so compartmentalized that “modeler” is considered an adequate description of someone’s research area, as in “oh, she’s a modeler”. That label is at the same time complimentary and derogatory. On the one hand it implies some mastery of computer programming, mathematics, statistics, and probably a healthy publication list. On the other hand it can imply a reductionist scientific philosophy (wherein interesting phenomena are reduced to simple, predictable “parameters”) and a tendency to see a better model as the overarching scientific goal. I once went to a talk where a climate model lectured the audience for 45 minutes on how to “be a good observational scientist” for the sake of model improvement. It was too easy to walk away from the talk with the sense that that modeler’s primary goal was not an improved understanding of the system but the development of a model that best reproduces the observed state (and thus might reasonably predict a future state). However that isn’t necessarily a bad thing, so long as the limitations are understood.

I see models as falling into two categories, which I’ll call mechanistic and predictive. A predictive model, such as a global climate model or a protein structural prediction model, isn’t about understanding the system. It’s about accurately predicting the endstate. Interesting dynamics occur in any natural process that don’t have an impact on the outcome. Representing all of them in a model is time consuming and computationally expensive. Consider a model designed to predict how well an automobile functions, where well is defined as driving in an efficient and safe fashion. Lots of phenomena contribute to this; the fuel delivery system, the various engine components, the exhaust system, the brakes, etc. It might be very interesting that the driver chooses to listen to the radio or run the AC while driving. Even though this has an impact on the other subsystems in the car, it doesn’t have a direct impact on the car running well. If we wanted to represent the operation of the vehicle as a model we would probably decline to include these phenomena as variables. To do so would require developing equations that predict their operation in relation to one another, and solving those equations repeatedly when the model is run with no improvement in our ability to predict the car’s performance.

A mechanistic model on the other hand can be thought of as an inventory of all the tiny pieces of a system that contribute to the system’s operation. Yes, we can represent the performance of the car in the previous model perfectly well without including the radio or air conditioner, but if we want to explore how these systems react with other subsystems they need to be included. For example, suppose a driver likes the interior of the car really, really cold and hopes to install an industrial grade air conditioner that exceeds the wattage of the alternator. This would certainly impact the performance of the vehicle but, since we opted not to include it in the predictive model of automobile performance, we’d never know it from running that model

The hard reality is that no one will ever raise enough money to hire enough researchers and purchase enough computer time to run mechanistic models of the Earth system, or even a local ecosystem. As valuable as that would be it’s just too difficult to do (without lots of time, people, and money). Thus developing useful predictive models depends on very careful selection of the parameters that will be included.

In predictive oceanographic models an oft-cited example is the representation of phytoplankton. These single-celled primary producers are responsible for a huge chunk of global carbon fixation and carbon sequestration. Often a single “average” of nutrient requirements and carbon uptake and export rates is used for phytoplankton in climate and ecosystem models, despite the fact that there are thousands of different phytoplankton species, often with radically different nutrient requirements and uptake/export rates. Representing phytoplankton as a single variable is efficient, but can produce an erroneous result in a predictive model. Furthermore such a predictive model is useless for experiments that explore ecosystem impacts of changing phytoplankton populations. However it isn’t practical to represent thousands of phytoplankton species as separate variables. Not only would it be computationally inefficient, we don’t know enough about most of those species to estimate their nutrient requirements of uptake/export rates. So where do we draw the line? What about prokaryotes, which are even more diverse than phytoplankton? Viruses? Heterotrophic protists?

Right now I’m on my way to a workshop at the Bigelow Laboratory for Ocean Science at Boothbay, Maine, for three days of exploration of this issue. A major question of the attendees is how much observing is required to get some of these parameters approximately right? How many phytoplankton does one need to sequence for example, before we have a reasonable understanding of the genetic diversity (and thus metabolic potential) of this group? The workshop, sponsored by the Ocean Carbon and Biogeochemistry program of the US Carbon Cycle Science Program, was organized by the Woods Hole Oceanographic Institute and the organizers did an excellent job of including students and postdocs in addition to senior researchers. Thanks for the opportunity to join the conversation and I’m looking forward to digging into these issues!

6350 Total Views 2 Views Today

Some thoughts on modeling

Related

Leave a Reply Cancel reply

Post categories

Most recent posts

Copyright

Disclaimer

Meta