Friday, February 15, 2008

Insufficient, Inaccurate and Inconsistent

The data I mean...

The job of the geophysicist, I was told yesterday at a seminar given by Mrinal Sen, is essentially impossible. It is, given data that is insufficient, inaccurate and inconsistent to come up with a useful model.

Interestingly, it is unlikely that the skeptics' squad will make much of this admission, since the utility of the models Sen is specifically interested in is whether they correctly inform placement of oil well drilling efforts.

Nevertheless wells do get drilled. In the past it was a matter of some combination of analysis and intuition. Nowadays, statistics works in there as well.

Does climatology partake of these fundamental problems? Not as much as seismology, really; our observations are relatively accurate and consistent compared to seismic data. They are far from sufficient for long time scale processes, and the formalism of use of paleodata still leaves much to be desired.

Nevertheless, the future will happen, and we will do something about it. The question at hand for climate modelers and their customers is to what extent the models ought to affect what we do.

Computational geosciences and computational biological sciences are very different from computational engineering in flavor. In engineering (aside from civil and architectural, which partakes slightly of our problems) the system is entirely controlled and hence the tradeoffs between model expense and model accuracy are well understood. By contrast, our problem is to establish the right model based on observational data and theory.

This is called "inverse modeling" in some circles. I dislike the name and regret the loss of the appropriate term "cybernetics" to Hollywood and to some rather ill-defined corners of computer science. I propose, therefore, to call what some computational sciences do "meta-modeling", wherein the model itself (and the nature of its relationship to observation and theory) is construed as a legitimate object of study.

It is interesting how well-established this modality of thought is in mineral exploration (where there is, after all, a bottom line to take care of) and how controversial it remains in climate science. I have some thoughts as to why this might be. Some of the problems are directly a consequence of the unfairness of the unfair criticisms to which we have been subjected; this makes a fair assessment of fair crticisms fraught with peril. Others emerge from the history of the field and the social structures in which the work is performed.

It seems obvious to me that if the computational resources applied to climate go up several orders of magnitude, the correct approach to this is not, or at the very least not solely, to build a small number of immensely complex modeling platforms which use all the available computational power. Rather, it is to apply the (expensive but powerful) inverse modeling methodology to explore the model space of simpler models; to improve the expressiveness of model description languages; and to use these tools to build a much larger and more formal ensemble of climate modeling tools.

I also have a question whether the most policy-relevant models and the most science-relevant models are necessarily the same. I suspect otherwise. Climate modelers are so defended against the accusation of heuristics that they fail to apply the heuristics that might apply in an applied science effort.

Sen presented an interesting taxonomy of inverse methods at his talk. Much more to follow on this subject, hopefully.

Tuesday, February 12, 2008

Computational Silence

Still not sure whether to take Mr. Hughes seriously, even though he echoes some of my points. His latest looks pretty confused at first reading but it's far from my expertise. Maybe Eli will have a look.

That all said, I am not one who says all is right with computational climatology.

My friend JL sends along an article by (not captain) James Quirk that does a good job outlining the quandary and some of the efforts toward a solution. Climate science is particularly backward in adopting these measures. There's a misperception that all GCMs solve the same problem that is understandable in the public but hard to account for within the field.

The article is called "Computational Science: Same Old Silence, Same Old Mistakes". It appears in a volume on AMR. It is behind a Springer firewall, so if you can't get the PDF note that Google will display a scan of it.

(Remember, rationally, the less we trust the models the more severe our policy constraints on modifying the system should be.)

Why Pencil Science?

This blog has, at least for now, wandered from its original intent, but its original explanation of purpose is worth saving. I do think computer science is for everybody, not just middle school kids but even climatologists...
This is a blog about the nuts and bolts of computer literacy from a
Pythonic perspective.

I would like not merely to defend the position that computer
programming is "for everybody" (including children, intelligent
adults, and scientists) but also to examine the proposition that
Python is the appropriate vehicle for such a de-professionalization of
programming.

Of course, "computer science is not about computers" but about
formally expressing processes using symbols. The misnomer is
consequential in how most people think about the question of the role
of programming in education.

We don't teach children how to write because we expect them all to be
professional writers. Professional writers are not opposed to literacy
on the grounds that they will lose readership. We do not call the
ability to write "pencil science".

Is writing (pencil science) for everybody? That wasn't clear a
thousand years ago. Can "computer science" be for everybody too?
I'll be reorganizing my online presence soon. Still thinking about what to do...

Thursday, February 7, 2008

Bad month index=36; help, you prawns!

Recall we are trying to get a sensitivity to 2xCO2; the public thinks of this as the primary use case.

Most of what was hanging me up for the past week was building a 6Mb one-time data file (containing climatological mixed layer depth, clamped at 200m; the magic number is somewhat arbitrary and hard-wired, but we suppose others have put thought into it.) It would have been nice if I could just download the file. It turned out I needed a c compiler compatible with the Fortran compiler that built netcdf. The magic word was "icc", specifically

setenv CC=icc

Once that was done, it turned out I had some other data file to dig up. I had trouble tracking down my ncar UID/password (as if this data needed to be kept under wraps for some reason). Then I found that the URL specified in the README was incorrect! Eventually tracked the file down with Google. Built (huzzah) the heat flux boundary condition. Thought I was done.

Today I tried to submit the job. Well, still a use case I;ve never run. I was hung up for some time on the error messsage "BNDDYI: Bad month index=36" which is not mentioned in CAM documentation but is mentioned in the CCM3.6 documentation! (an ancestor)

It seems the month number should be a number between 1 and 12 (reasonable). No idea where the number 36 came from. I look into my data file and see the following for dates:

time = 116.5, 215, 316.5, 416, 516.5, 616, 716.5, 816.5, 916, 1016.5, 1116, 1216.5 ;

There are twelve of them as needed, but...

Those are indeed months in a peculiar sense. Consider it a puzzle, and here is your clue:

double time(time) ;
time:units = "day as %m%d.%f" ;
time:axis = "T" ;

Go figure. No clue where the 36 comes from. Charles advises, though, that I should not specify a start and end date for a slab run. So I make no effort to surgically fix the month numbers and remove the start and end date, specifying a duration instead. (Because I specify in days rather than seconds, I use a negative number...)

The 36 BNDDYI message goes away! Apparently the error "month number is 36" really means "you can't specify dates on a slab run".

The model gets considerably further now. It starts to run the first time step and then proffers this droll message a few times

sublimate away all sea ice
something is probably seriously wrong
ice state at dh stop nstep = 0

amid some other unexpected output, and then aborts.

Three weeks and counting. We wanted a result by next Monday and this will take 48 hours to run. Groan.

Somebody else has come to this pass, describing it as "display the seriously error". Note the useful assistance provided by the community. There's also something in Chinese that Google translates as follows:

help you prawns, the slab ocean model CAM operations serious mistake!

-=-=-=-=-=> Turn CCM and CAM counterparts all know, by adding a simple ocean model (slab ocean model) running CAM, but regardless CAM3.0, or CAM3.1, there are the same mistakes can not continue to run, the Daxia Please help!
Any prawns out there have any advice for me?

Update: Running at last! The last hurdle was knowing that you have to replace the file

cami_0000-09-01_64x128_L26_c030918.nc

with the file

cami_0000-01-01_64x128_T42_L26_SOM_c030918.nc

Unlike the mixed layer depths, this is actually distributed in the initial conditions dataset. I am also using a topography file that differes from the standard one. Not sure why, this is based on lore and not knowledge. It's called

topo-from-cami_0000-01-01_64x128_T42_L26_SOM_c030918.nc

I don't know its provenance, but possibly the name is correct. If that's the case, though, it isn't clear why there are two different files at all. Some totally undocumented use case perhaps?

This is actually consistent with the message from "seriously message" guy. It is not documented as part of the procedure for the use case in the documents. There is a tiny clue in the redme for defineqflux:

Another executable available (but hopefully unnecessary) in the definesomic
module is (also) called definesomic. It adds necessary SOM fields to an
initial dataset if they are not already there.
If there are fields missing from the initial conditions, you'd think something better than "something is seriously wrong" and lots of numerical output would be possible.

Not sure how much further to investigate.