Wednesday, December 26, 2007

Why is Climate Modeling Stuck?

I believe that progress in climate modeling has been relatively limited since the first successes in linking atmosphere and ocean models without flux corrections. (That's about a decade now, long enough to start being cause for concern.) One idea is that tighter codesign of components such as atmosphere and ocean models in the first place would help, and there's something to be said for that, but I don't think that's the core issue.

I suggest that there is a deeper issue based on workflow presumptions. The relationship between the computer science community and the application science community is key. I suggest that the relationship is ill-understood and consequently the field is underproductive.

The relationship between the software development practitioners and the domain scientists is misconstrued by both sides, and both are limited by past experience. Success in such fields as weather modeling and tide prediction provide a context which inappropriately dominates thinking, planning and execution.

Operational codes are the wrong model because scientists do not modify operational codes. Commercial codes are also the wrong model because bankers, CFOs and COOs do not modify operational codes. The primary purpose of scientific codes as opposed to operational codes is to enable science, that is, free experimentation and testing of hypotheses.

Modifiability by non-expert programmers should be and sadly is not treated as a crucial design constraint. The application scientist is an expert on physics, perhaps on certain branches of mathematics such as statistics and dynamics, but is typically a journeyman programmer. In general the scientist does not find the abstractions of computer science intrinsically interesting and considers the program to be an expensive and balky laboratory tool.

Being presented with codes that are not designed for modification greatly limits scientific productivity. Some scientists have enormous energy for the task (or the assistance of relatively unambitious and unmarketable Fortran-ready assistants) and take on the task with energy and panache, but the sad fact is that they have little idea of what to do or how to do it. This is hardly their fault; they are modifying opaque and unwelcoming bodies of code. Under the daunting circumstances these modifications have the flavor of "one-offs", scripts intended to perform a single calculation, and treated as done more or less when the result "looks reasonable". The key abstractions of computer science and even its key goals are ignored, just as if you were writing a five-liner to, say, flatten a directory tree with some systematic renaming. "Hmm, looks right. OK, next issue."

This, while scientific coding has much to learn from the commercial sector, the key use case is rather atypical. The key is in providing an abstraction layer useful to the journeyman programmer, while providing all the verification, validation, replicability, version control and process management the user needs, whether the user knows it or not. As these services become discovered and understood, the value of these abstractions will be revealed, and the power of the entire enterprise will resume its forward progress.

It's my opinion that Python provides not only a platform for this strategy but also an example of it. When a novice Python programmer invokes "a = b + c", a surprisingly large number of things potentially happen. An arithmetic addition is commonly but not inevitably among the consequences and the intentions. The additional machinery is not in the way of the young novice counting apples but is available to the programmer extending the addition operator to support user defined classes.

Consider why Matlab is so widely preferred over the much more elegant and powerful Mathematica platform by application scientists. This is because the scientists are not interested in abstractions in their own right; they are interested in the systems they study. Software is seen as a tool to investigate the systems and not as a topic of intrinsic interest. Matlab is (arguably incorrectly) perceived as better than Mathematica because it exposes only abstractions that map naturally onto the application scientist's worldview.

Alas, the application scientist's worldview is largely (see Marshall McLuhan) formed by the tools with which the scientist is most familiar. The key to progress is the Pythonic way, which is to provide great abstraction power without having it get in the way. Scientists learn mathematical abstractions as necessary to understand the phenomena of interest. Computer science properly construed is a branch of mathematics (and not a branch of trade-school mechanics thankyouverymuch) and scientists will take to the more relevant of its abstractions as they become available and their utility becomes clear.

Maybe starting from a blank slate we can get moving again toward a system that can actually make useful regional climate prognoses. It is time we took on the serious task of determining the extent to which such a thing is possible. I also think the strategies I have hinted at here have broad applicability in other sciences.

I am trying to work through enough details of how to extend this Python mojo to scientific programming to make a credible proposal. I think I have enough to work with, but I'll have to treat the details as a trade secret for now. Meanwhile I would welcome comments.