Friday, February 15, 2008

Insufficient, Inaccurate and Inconsistent

The data I mean...

The job of the geophysicist, I was told yesterday at a seminar given by Mrinal Sen, is essentially impossible. It is, given data that is insufficient, inaccurate and inconsistent to come up with a useful model.

Interestingly, it is unlikely that the skeptics' squad will make much of this admission, since the utility of the models Sen is specifically interested in is whether they correctly inform placement of oil well drilling efforts.

Nevertheless wells do get drilled. In the past it was a matter of some combination of analysis and intuition. Nowadays, statistics works in there as well.

Does climatology partake of these fundamental problems? Not as much as seismology, really; our observations are relatively accurate and consistent compared to seismic data. They are far from sufficient for long time scale processes, and the formalism of use of paleodata still leaves much to be desired.

Nevertheless, the future will happen, and we will do something about it. The question at hand for climate modelers and their customers is to what extent the models ought to affect what we do.

Computational geosciences and computational biological sciences are very different from computational engineering in flavor. In engineering (aside from civil and architectural, which partakes slightly of our problems) the system is entirely controlled and hence the tradeoffs between model expense and model accuracy are well understood. By contrast, our problem is to establish the right model based on observational data and theory.

This is called "inverse modeling" in some circles. I dislike the name and regret the loss of the appropriate term "cybernetics" to Hollywood and to some rather ill-defined corners of computer science. I propose, therefore, to call what some computational sciences do "meta-modeling", wherein the model itself (and the nature of its relationship to observation and theory) is construed as a legitimate object of study.

It is interesting how well-established this modality of thought is in mineral exploration (where there is, after all, a bottom line to take care of) and how controversial it remains in climate science. I have some thoughts as to why this might be. Some of the problems are directly a consequence of the unfairness of the unfair criticisms to which we have been subjected; this makes a fair assessment of fair crticisms fraught with peril. Others emerge from the history of the field and the social structures in which the work is performed.

It seems obvious to me that if the computational resources applied to climate go up several orders of magnitude, the correct approach to this is not, or at the very least not solely, to build a small number of immensely complex modeling platforms which use all the available computational power. Rather, it is to apply the (expensive but powerful) inverse modeling methodology to explore the model space of simpler models; to improve the expressiveness of model description languages; and to use these tools to build a much larger and more formal ensemble of climate modeling tools.

I also have a question whether the most policy-relevant models and the most science-relevant models are necessarily the same. I suspect otherwise. Climate modelers are so defended against the accusation of heuristics that they fail to apply the heuristics that might apply in an applied science effort.

Sen presented an interesting taxonomy of inverse methods at his talk. Much more to follow on this subject, hopefully.

8 comments:

David B. Benson said...

Civil, architectural, enovironmental and also chemical engineers have much the same problems of having to interact with the real world as it really is.

In a strong sense, mechanical and computer engineers build their own world. Most electrical engineers too; pwower engineers less so.

Not clear these distinctions help much.

Michael Tobis said...

My point is that when you control your system sufficiently you already know the equations that constitute it. We don't have that in climate, at least not effectively at any practicable resolution.

David B. Benson said...

I doubt that structural engineers have a very deep grasp yet of suitable equations to use in the face of earthquakes. They have various approximations. That was what I was trying to point out.

Anyway, I still fail to grasp what 'inverse method' is supposed to mean...

Michael Tobis said...

Here's a good definition of the topic of inverse methods as Sen sees it.

In short, in this jargon, a forward problem is, given a model, predict some observations; an inverse problem is, given observations, identify a model.

In a linear system these problems are closely related. In a nonlinear system they are asymmetric, and require many runs of the "forward model", i.e., in our case, candidate models within our model optimization space.

There is an extensive and rich formalism for this class of problem which is relevant given sufficient computational power and not otherwise.

David B. Benson said...

Thanks! I know understand

inverse method == parameter estimation

That is, one has a set of equations, possibly wth constraints, and some unknown parameters as coefficients in the equations and even in the constraints. The problem is to obtain a decent estimate of the values of these parameters, given some evidence (data).

Have I (more or less) got it?

Michael Tobis said...

Pretty much. Your summary certainly captures the practice as it mostly exists.

The concept can be extended to cases where there are competing models; where the differences aren't just parametric but structural. That said, you have to walk before you can run, and there's plenty to do within a given model structure in many cases.

One point I have been making (mostly in my series of flushed proposals) is that the best way to compare alternative formulations is to spend equal effort tuning each formulation. Without formal tuning methodologies that isn't feasible.

That all said, optimal parameter tuning is important, mathematically mature within engineering, and not well understood within climate, so it has considerable potential for improving matters.

The nature of the resistance to this class of idea within the climate establishment is interesting. It isn't something I fully understand.

David B. Benson said...

Ok. To compare competing models H and K given evidence E, I just use the naive Bayes factor method. Good enough for my problem.

To estimate parameters I use a variant of the simplex method, the multiplex method. The only difference is that rather than just a n+1 simplex in n dimensional parameter space, the method starts with at least n+1 vectors (of length n) but possibly as many as 2^(n-1). This gives better coverage of the hilly n dimensional parameter space, so does better at avoiding local minima traps. There are some obvious improvements which could be made if I had time to do so, wherein the algorithm starts with a large number of guesses and cuts these down as it goes, eventually just using a single simplex when it becomes clear that a valley has been found and the algorithm is simply polishing answers.

If you come across a reference to an algorithm which looks something like that, kindly post or e-mail me a reference to it.

Anonymous said...

You write very well.