Tuesday, January 22, 2008

Life in the Climate Trenches: Part 1

I've got not one but two build problems going on. If the rest of my life weren't going well I'd be awfully depressed. I really hate wasting time on this sort of mundane frustration.

Lonestar (the supercomputer) is offline for alternate Tuesday maintenance, so for today I will share build Probleme Numero Un with you. I welcome any skill transfer that can help untangle all this; comments welcome. Rather than polluting the blogspace with innumerable followups I will just provide updates to this entry.

Tomorrow I'll start documenting Probleme Numero Deux: running a standard CAM use case.

It may be that we'll more be documenting my own ignorance than any fundamental problem with the software from some point of view; that will be a learning experience just the same.

You may object that this isn't the sort of thing you want your tax dollars paying PhD's to fumble around with; in that case I would be in agreement with you. In the private sector this would clearly be a systems task and not a scientist task. So I appeal to your mercy as a lousy admin and a potentially productive scientist, albeit of a somewhat eccentirc stripe. The sooner I get these things done the sooner I can be a scientist.

OK, here's the scoop (unix/posix skills required to make any sense out of this):

To participate in NCAR CAM development we need to use their diagnostic package which in turn depends on NCL. Having a legitimate connection to a US research institution, I have long since jumped through the hoops and am registered on the Earth System Grid. So I download the package and all its dependencies.

OK, so the UTIG disk systems are all cross-mounted and so, though I have root on the climate research machine I don't have write privileges on /usr . Our excellent systems folk have set it up so that climate users just define $CLIMATE in their .cshrc which will invoke anything I put (or conceivably somebody else puts) into /usr/local/climate so all I have to do that is nonstandard is insert "climate/" into the path; I make my own ../bin ../lib ../man in there; everyone interested in climate ends up with it in their path, nobody else has their namespace messed up, a very nice approach.

Alas, the lengthy but very clear build instructions do not work (this is standard practice with NCAR products alas; apparently they lack the skills and/or funding to actually make their products portable, though the build instructions are usually very hopeful.) In the present case the snag occurs even before I get to NCAR's stuff.

NetCDF is already available. I have acquired and build jpeglib and zlib as follows:

cd jpeg-6b
make clean
./configure --prefix=/usr/local/climate
make install

which puts five executables into /usr/local/climate/bin and

cd zlib-1.2.3
make clean
./configure --prefix=/usr/local/climate
make install

which puts a library into /usr/local/climate/lib

My current snag is building HDF-4, which in the configure phase always stops at:

checking zlib.h usability... yes
checking zlib.h presence... yes
checking for zlib.h... yes
checking for compress2 in -lz... yes
checking jpeglib.h usability... no
checking jpeglib.h presence... no
checking for jpeglib.h... no
checking for jpeg_start_decompress in -ljpeg... no
configure: error: couldn't find jpeg library

I have tried various combinations here. What seems to me ought to work is

./configure --prefix=/usr/local/climate --with-zlib=/usr/local/climate \
--with-jpeg=/usr/local/climate --disable-netcdf

but it seems to ignore the prefix altogether. I change --with-zlib to point to a totally bogus address, and yet it finds the zlib stuff anyway. It seems to be ignoring the flags I am setting.

Working hypothesis: I am building jpeg wrong, and some system setting is overriding my value of --with-zlib.

A less satisfying but easier solution could come from downloading binaries. At this point I simply have to swallow my pride and ask which if any of the binaries listed here might work.

Advice would be even more welcome than sympathy.

Update 1/24:

UTIG support advised option 4, which worked more or less smoothly.

Also, from UTIG support:

Oh I didn't see you had your own (up a directory). The command you
were looking for would be

./configure --prefix=/usr/local/climate \
--with-zlib=/usr/local/climate/NCLc3/zlib-1.2.3 \
--with-jpeg=/usr/local/climate/NCLc3/jpeg-6b --disable-netcdf

And I'm idiotically proud of this script, my first conditional in a shell script in a long time:

set hn = `hostname`
if ($hn == 'foo.whatever.utexas.edu') then
echo "climate paths added"
setenv NCARG_ROOT /usr/local/climate/NCLBin/
set path = ( /usr/local/climate/NCLBin/bin/ $path )

So there's finally a happy ending on this one. TACC is still having hardware or configuration problems with Lonestar which is preventing any progress on the other front.


Belette said...

Why not just edit the config script and find out where the zlib / jpeg is being found; and if needed fix it?

David B. Benson said...

First, my deep-felt sympathy. You shouldn't have to do that sysadmin stuff!

As to which of 4--8 might work, you simply match up the binary for the d4esignated chip type with whatever chip type you'll be using. (I assume than Lonestar uses exactly one.)

Bryan said...

My sympathy too. For what it's worth, you have to do this stuff in the UK yourself anyway, not nearly enough systems support for anyone else to do it ...

Michael Tobis said...

David, Lonestar was down for maintenance; this is a local build on a UTIG system.

All, if we were as dishonest as some people think, we wouldn't be trying to do this stuff, and if we were as rich as some people think but not entirely dishonest, we'd have competent specialists to do this stuff.

I think this speaks volumes about how climate research is conducted in practice basically as if it didn't matter very much. Meanwhile, I slog away. Will offer updates, but this particular project is back-burnered because Lonestar is up again.