Tuesday, January 15, 2008

Language of Choice

Presumably everyone knows that Python is on the rise with a bullet. Of course, now that Python is sexy it will probably attract some weaker programmers, diluting its reputation. Such are the costs of great beauty. Right now, though, the upswing seems to be near its maximum rate.

In the Slashdot article discussing this I saw a ref to a lexicon definition the concept of Language of Choice in the Jargon File. FORTRAN is dismissed as a niche language for scientists, but nothing better is proposed.

So what should the language of choice be for high performance numerical computing?

There are teams working on replacements (Fortress, Chapel, some IBM thing with a forgettable name), but I think much of the high performance work of the future can be done in Python with a code generation strategy. I'm not sure what the target language ought to be for the code generation. Would there be advantages to Fortress or Chapel for this? For contemporary purposes even F77 would work (as I said in the PyNSol papers) but to be honest the F77 target is almost a stunt, just to make a point. I don't expect it would be useful in the long run.

I am currently thinking most of the system should be agnostic as to the target language. If done right, multiple targets can be supported in an architecture-dependent bottom layer. Even Verilog is a candidate. ( F90 and onward are not. I'd sooner use BF or Whitespace. )

Update: Bryan Lawrence has a nice summary of the issues.

10 comments:

David B. Benson said...

Whatever programming language one actually wishes to write in, I needes point out that Standard ML was orginally designed to be a Meta-Language in which one would write (what were called in those days) Object Language interpreters. [Nothing what-so-ever to do with so-called object-orientation.]

There even has been a MetaML project to facilitate aspects of this.

While Standard ML compilers do not produce the very fastest code, they do quite well on numeric applications and the leading two compilers both have a decent foriegn function interface for reaching C/C++ code.

Hmm, next time I meet Carl Hauser (who likes both Python and Standard ML) in the lunchroom, I'll ask him about doing Python in SML...

AK said...

I'd expect that in modeling, like in business programming, you'd be better off with static type definition. You want the compiler to catch as many errors as possible before you start executing test runs. You'll have enough logic errors without adding type mismatches and other issues the compiler could catch.

Of course, it wouldn't hurt to have a dynamic system as a test bed for the logic that's the heart of climate modeling, but once you've tested the chunk of code, I'd think you'd want to move it into a "tight" environment.

Sort of like using JavaScript to test logic strings before coding them in Java.

If you want performance, one thing to do is reduce or eliminate dynamic memory allocation. I'd also suggest not using any argument passing by reference. This would mean no re-entrant function calls (at least none without a pre-planned maximum recursion). Could climate modeling live within those limitations?

My experience from business programming is that languages or environments with lots of fancy "time-saving" bells and whistles sell well to the senior (and sometimes middle) management, but they end up losing more time in testing and failure after installation than they ever save in coding.

If a language that meets your criteria (whatever they are) doesn't exist, you might consider trying to start from scratch. The compiler and utilities could be written in C++/C, as an open source project. In addition to ending up with something much more effective for coding climate models, you'd get the technical open source community involved in climate modeling, making it much less of a mystery for the general public (at least that part of it).

Michael Tobis said...

AK, Python people strongly believe that static typing is overrated as a code integrity mechanism. You have to try the Python way for a while to understand. I definitely am looking for rapid application prototyping, but I am also looking for a direct path to a deployable multiple cpu high performance code without a shift in environment.

Climate models have no recursion, no dynamic allocation, no garbage collection, no re-entrancy, no objects, no nothing you might have learned in a CS program in the last 15 years. You have to see it to believe it. It's very very very different from what you see in commercial software, and in some (not all) ways it is quite primitive, while in others that you have never heard of it is quite sophisticated.

I am completely convinced that it is worth trying the Python way on HPC codes. There is stuff I'm not spelling out but if you track down my few obscure publications you can work out the outlines of it.

Unfortunately I'm not one of those people with boundless energy for software development. (Partly because I like to read and write about climate science and climate policy and related questions, see?) So to work on this idea in practice I need to be funded to do it or else scale back my ambitions considerably.

I'm not averse to people trying other things. What really bothers me is more of the same. We are reaching some pretty hard limits on further progress. It's not that the work to date has not been remarkable, but that the work in prospect seems far less likely to pay off.

Michael Tobis said...

David, there's too much already happening in Python that is too interesting to me for me personally to consider taking up any language that mostly has a theoretical constituency.

I think it would be fine if people applied other approaches to HPC. I have some doubts that the climate community will adopt them; so far we seem especially reluctant to rethink what we do.

I'd be very happy to hear Carl's thoughts on this.

David B. Benson said...

Carl met me in the lunchroon today.

He doubts that another effort to speed up Python code is required, especially for numerical computations. He suspects the data structures offered by Python offer appeal to you (certainly does for others). He remains of the opinion that while Python is great for working out ideas via the smaller programs; for large ones, bugs will lurk due to the lack of static typing.

For me, the thought of a Python-to-Fortress translator is of interest. But I doubt the lack of progress in regional climate modeling is due to choice of programming language or indeed even programming technique. I'll naively opine that some advance in multi-scale methods is what is fundamentally missing...

Michael Tobis said...

First of all, I don't think we should be expecially interested in Python runtimes.

Some people beleive Python can eventually be performance competitive but I doubt it; there's no intent to improve Python threading, and multicore will kill Python as dead as it will kill Fortran.

I think you are on the right trtack about where the issues are, but how are we to make progress? How can we embody meaningful multi-scale (and more importantly, multi-physics) simulations as long as building a single component is so damned clunky and the results are so inelegant?

The issues with climate are different from what most of existing HPC cares about in some interesting ways.

David B. Benson said...

Well, Standard ML has an elegant module structure which certainly avoids clunky. (I haven't looked in detail, but I suppose that O'Caml, F# and Fortress all have something similar.) Once one has modularized various PDE solvers, for example, this code never needs be looked at again.

I'll offer three suggestions: (1) Take some (but not a lot) of time looking at the modulariztion features in those languages. (2) Make up a toy (pencil-n-paper only) language in which you can easily express whatever you need to. (3) Then, arrange to have lunch (or otherwise visit with) the SML/NJ folks a few blocks from you: in order of seniority, Matthies Blume (TTI), John Reppy (UC), Dave MacQueen (UC). The goal is simply to discuss your dilemma(s). I am sure they will be sympathetic and offer some sage advice (from the perspective of the mostly functional programming community). [If you do, be sure to give them my best regards.]

What do you mean by multi-physics?

Michael Tobis said...

You seem to think I'm still at Chicago, but I'm at UT Austin.

I find myself resisting your advice on a couple of levels. Specifically, and this is a core issue, there are no obvious toy problems.

The prototype problem is a coupled atmosphere-ocean system. Nature shows us that this system has intersting dynamics. There are three sets of fundamental physics: the atmosphere, the ocean and the (literal) interface. (Amusingly, once you get the land involved you also get the "littoral" interface but no matter...)

The question of what the most informative model of this coupled system might be depends on all three physics components and various design decisions.

At present, doing this at all is so difficult that only a very small subspace of the potential programs has been explored. For reasons hard to explain here, the direction of the climate community is to make such exploration more rather than less difficult.

What we seek is a platform to yield physical insight. Neither performance nor fidelity is the primary issue.

It's fundamentally a cognitive issue.

The attraction of Python is then that it is accessible enough to the investigator and powerful enough for the platform developer.

David B. Benson said...

Oops! I am behind the times in many respects. Apologies.

I didn't have in mind toy problems, but rather a toy programming language which you find to be more expressive than, say, even Python.

But no matter. Your prompt reply suggests to me, numerically inexpert, that the difficulty lies in expressing the interface physics, not (yet) in programming language expressiveness issues.

David B. Benson said...

Thanks for the link to Brian. (I found Paul Graham's essay especially useful!)

After glancing yesterday at Python's packages/modules I'll opine that these might be good enough for your purposes. They aren't for mine (in a very different application area, however).

Perhaps climate modelers already know about

Quadtrees

which are used in GIS and which occurred to me whilst thinking about your 'interface physics' on the walk to the office. My naive thought is that the lowest air cell gets divided up, as a quad tree, as does the top ocean cell, until these match well enough to apply the (presumably non-linear) equations of the interactions in the smallest portions of the quad tree. When part of a quad tree has sufficiently similar properties, that portion is coalesed into a single node.

Maybe somebody has already written appropriate code for this data structure in Python. In any case, it is rather easy.

My main point is that perhaps climate modelers need to have a greater appreciation of interesting data structures. That might do more to 'unstick' climate modeling than choice of modern programming language.