Friday
Feb052010
by Bishop Hill
Darrel Ince gets it
Feb 5, 2010 Climate: CRU FOI
Darrel Ince, a professor of computing at the Open University "gets it".
Many climate scientists have refused to publish their computer programs. I suggest is that this is both unscientific behaviour and, equally importantly, ignores a major problem: that scientific software has got a poor reputation for error.
Reader Comments (36)
The leaked programs are the work of amateurs - like the rest of "climate science".
For a long time "climate science" was a cushy academic backwater - like Golf Psychology or Gender Studies. Packed with low-voltage plodders who couldn't do science, couldn't get jobs at the Met Office, couldn't write proper computer programs.
They've never discovered any new laws, no real insight into the climate - they can't even keep their own results and data safe. But boy they could wear sweaters and corduroy trousers and leave bits of breakfast in their scruffy beards...
BBC poll shows Climate scepticism 'on the rise'.
"25% of those questioned did not think global warming was happening, an increase of 10% since a similar poll was conducted in November.
...
"...only 26% of people think 'climate change is happening and is now established as largely man-made'"
As noted in another post, I once -- 40 years ago -- worked at the Cornell University computer center as a consultant to the users whose programs "didn't work". While all of these users had good intentions, they did not have the foggiest idea about how to write a computer program, any idea what could go wrong with scaling issues and such (called computability back then). truncation problems, mixed mode, overflow and underflow, etc, etc. In short -- complete gibberish. And terrible as well, I saw many 4000 line mainline programs with a thousand "goto" statements. We called it "spaghetti" code back then.
And as noted, about every third program I saw had the infamous "Do 15 J = 1.100" mistake. The compiler never spots that because it is a perfectly legal FORTRAN statement. And the loop executes once, not 100 times, with J set at some random value. (This would be FORTRAN IV I am using, but similar issues exist in any computer language. You have to know what you are doing to program computers successfully.)
Nothing has changed from what I saw in the samples posted in the leaked files. It is all garbage. And as several very experienced software engineers pointed out in posts on this blog, the AGW crowd generally used FORTRAN do do string manipulation, something better done in any of a dozen other languages, such a PERL or C++.
Their work is garbage, absolute and total garbage and it is clear they know it. That is why they are hiding it.
@Jack Hughes
You are being too kind to them. :) But I agree with your points.
I think they got lured into deep water bit by bit.
For most academic research involving writing computer programs and data management there's no question of having specialists designing a system. The researchers have to pitch in and do it. It doesn't matter that much if it's amateurish, because in the scheme of things, the research doesn't matter. If the research is later invalidated, that's what can happen with research.
In this case, I suggest the politics had gradually escalated to the point where they couldn't admit they were wrong. It also migrated from being a one-off to being a service. They ended up bending the methods and data to fit the desired conclusions. Far easier to do that if it's all kept under wraps anyway. If they'd retained a specialist IT team to set up a database with QC disciplines and produce the code with a version control system and quality control, it would have forced them to confront the shakiness of their pronouncements.
Mission creep also happens in the commercial world where you have IT experts. A project starts out and is successful within its objectives. Management finds it useful and bits are bolted on. It gets to the stage where the requirements have developed far beyond the original design. No one ever authorises the project to be redone from scratch with a new design. Band Aid is plastered on Band Aid.
Interesting contrast between the two polls - the BBC one asks if people believe in global warming while the Times poll asks about "climate change". The difference is subtle but I think some people may assume climate change means that the weather changes - of course it does - rather than the loaded interpretation of the term used by the IPCC and other campaign groups which has implicit that climate change is driven by mankind.
By some Gramscian process, the AGW proponents have managed to twist the meaning of these ordinary words and this allows them to claim a little victory every time someone accepts the label "climate change skeptic" or "denier" because denying that the climate changes sounds like an absurd position to most people.
It was also interesting that most of the people polled said they had not been influenced in their views by the media coverage of the climategate emails. I think there is a psychological trait in most of us that allows us to change our minds then convince ourselves that what we now believe was indeed what we believed all along. Even, or maybe especially, politicians do this but they have the misfortune to be filmed or quoted saying one thing then contradicting themselves later.
"It was also interesting that most of the people polled said they had not been influenced in their views by the media coverage of the climategate emails."
I'd say that two cold winters and miserable summers have a lot to do with it. The irony of people shivering in Copenhagen moaning about Global Warming wasn't lost.
It's hard to keep people in a state of fear for long. Climate change fatigue sets in.
Things like drowning polar bears have been overplayed and the facts have emerged. The idea is getting about that 'everything is bloody climate change and an excuse for taxes'. Saving the planet is OK when times are good and paying for it is some way in the future. In a recession minds are concentrated.
Media exposure of Climategate hasn't been timely or generally well informed. I guess that for most people, it adds to the impression that there's something to hide in the official position and you don't hide things unless you're trying to kid people.
I agree with don Pablo and cosmic. In the nuclear industry, software was developed and used under very rigorous quality control procedures, with multiple verification and validation. This method was in stark contrast to the work we encountered in universities and government agencies, where quality control was virtually non-existent. Of course these people were more intent on getting results, getting published and getting funding than in making sure they did a quality job. Climategate has revealed incompetent "climate scientists" hiding their inadequate and faulty code, data and methods in order to hide their incompetence.
I think I am being too kind to the CRU scientists.
Jack Hughes' first comment is perhaps something of an exaggeration - but only slightly. I have never regarded the CU people such as Jones & Briffa as "scientists", far less the "serious scientists" description used by Peter Liss in a recent interview. They are number-crunchers pure and simple wiith the emphasis on the latter word. The only point worthy of genuine scientific enquiry to emerge from their studies was the decline in tree-ring width with increasing temperature and they chose to hide it rather than confront it. Have they no intellectual curiosity at all? They also waste money. Jones must have spent thousands in the fatuous task of trying to prove that UHI does not exist or is of minimal influence. He could have asked any of the Met Office forecasters who regularly, on television, state that "temperatures in rural areas will be 3 to 5 degrees cooler than in towns". They at least have reputations to defend.
Reference the BBC poll aired today on BBC 6pm news.
It was notable that yet again only Prof Bob Watson of DEFRA was given an interview,
no skeptic POV, however.
Usual BBC approach.
http://news.bbc.co.uk/1/hi/sci/tech/8500443.stm
"Many climate scientists have refused to publish their computer programs."
Scientists such as Roy Spencer? Shall you submit the FOI or shall I?
"Many climate scientists have refused to publish their computer programs."
And many have not.
Any ETA on finding any issues that matter in those?
Just looking thru the fortran programs randomly for light entertainment. What a mess. No version control, no naming conventions. It's like my father-in-law's PC - any file can be called anything and be in any directory/folder.
Add in a few problems less visible to the untrained. Like numbers that can - and do - overflow and underflow. This is like having a fleet of cars with mileometers that only store 3 digits. Each car will show a mileage on the dashboard, but you don't know how many times it went round the clock.
And magic numbers embedded in the code - like PI being stored to different number of decimals in different files - instead of being in one central file.
Found a weirdo program that does average sunshine hours per day for a month. In January it divides total hours by 31. Come February and it divides the total by 28.25. Yes. And this is going to save the planet...
They are just D-I-Y bodgers. They had a go at programming and failed. They had a go at statistics - and failed. This is why they are terrified by the quietly-spoken and mild-mannered Steven McIntyre. They hate him because he can do the sums - and they can't.
They are paranoid. The emails show this. Their behaviour shows it.
Personally I would be very happy with climate science if they released their data and code so folk like Mr McIntyre could do a proper audit job on it.
If there is a problem with warming, skeptical or not, I want to know about it. The problem at the moment is you cannot trust any of their findings because of the antics of Mann et al.
If this is the only change that comes out of this whole affair then that will be a good result for science and I get the feeling that the problem, such as it is, will simply evaporate.
I have to admit that I too - a long time ago - wrote Fortran programs to process climate data. And yes it was pretty rough and ready. And no I would never in a wild hissy fit ever do that again. Programming Fortran is the software equivalent practicing your tap-dancing in a minefield.
Systems such as R and, what I used later, SAS are a lot safer, but still not foolproof.
My favourite Fortran error was the magical print statement fixing up a bug. People would come up and say "this program keeps on failing, but when I put in a print statement to debug it, it works properly!!" they would then mutter about "position dependent code" and "bloody useless compiler". Some would keep the print statements in and carry on regardless. Some would take my advice and look for the array subscript error that was smashing bits of memory. The print statement trick worked by allocating a bit more data memory that could handle being trashed - I think it was usually in a subroutine call/return.
Considering the £millions [ "Two entries in the DEFRA science database give some hint. The entry for 1990-2007 puts the sum at £146,275,582, while the next tranche for 2007-2012 stands at £72,536,724." - see here for more info ] that has been ploughed into 'climate research' the Met Office & CRU could have afforded to employ some professional software developers. The results would have been delivered quicker and more relaiably than have half-baked and undocumented programmes. There is no excuse for the incompetence that has been displayed - this is not some under-funded scientific backwater. Unfortunately one suspects that a not insignificant portion of the money ended up financing the smart offices the two organisations have in Exeter and Norwich respectively.
'"Many climate scientists have refused to publish their computer programs."
Scientists such as Roy Spencer? Shall you submit the FOI or shall I?'
The way to deal with the likes of Spencer is not to descend to their level and become involved in a brawl over FOI.
The obvious way forward is for the real experts (CRU, GISS ...) to release their data and methods. We can confidently expect this to be a tour de force of data management and code development. which will be treasured as an object lesson in how to proceed. The swatting of gadflies such as Spencer, would be the merest by-product.
Frank, cosmic:
In general, I have found the skeptics to be much more open in publishing their code. This has permitted errors to be found (they are certainly not infallible). I have also found their response to the discovery of errors to be far better than that of the Team.
I remember one incident from Spencer where this happened and he published an erratum with an introduction something like, "I would like to thank so-and-so for spotting this error..."
McKitrick also quickly published an erratum to a paper where Tim Lambert discovered a programming error because the code was made public with the paper. The erratum contained a detailed table showing how each result was changed due to correction of the error.
The difference between this behavior and CRU, GISS, and other Hockey Team members could not be more stark.
BTW OmniClimate has an early report of the Pielke/Ward encounter
http://omniclimate.wordpress.com/2010/02/05/complete-microblogging-of-tonights-pielke-jr-vs-ward-vs-muir-wood-london-debate/
For several years I worked on LINUX, an open source operating system which meant your contribution was not only open for others to read, but you were guaranteed that it would be scrutinized by dozens of the best programmers on this Earth. And they were unmerciful if you were sloppy in your coding. They would not only point out any errors -- GOD FORBID that you missed even the most arcane error -- but they would point out that you wasted machine cycles by using the "wrong" instruction, which is easy to do if you don't know what the compiler is going to do. They did. They knew exactly what the compiler would do because they wrote the damn thing.
I wonder what they would have to say about the code I saw in Harry_Read_me -- and that of the AGW programming geniuses who can make random data go up hill.
And Tilde Guillemet, thank you for reminding me of the "But it works with a print statement" issue. That use to drive them nuts and bring them to my office where I use to laugh aloud. Exactly correct. They were not checking their subscripts.
I agree with Jack Hughes that the standard of programming in scientific computing is appalling, especially in universities.
Most university scientists, engineers and mathematicians are not interested in numerical analysis and programming and cannot, therefore, guide and assess the work of their graduate students, who seem to do most of the programming. Very few of these students are adequately trained in numerical analysis and scientific programming. Most have taken one introductory-level course that tries to cover everything from floating point arithmetic to differential equations, along with lab sessions that try to teach them programming.
For many years I taught an introductory course in numerical methods (with lots of programming assignments) for graduate students who had undergraduate degrees in business, engineering, mathematics, and science. Also, these students came from a wide variety of universities. I found it depressing to come across (1) a 4-year engineering graduate who had written one program : to calculate the volume of a box; (2) A 4-year computer science graduate who had never heard of floating-point arithmetic; (3) A PhD student who thought that lectures on rounding errors and convergence analysis were a waste of time ("I have been programming for years without worrying about any of that stuff").
Here are some incomplete notes I have written for students that attempt to demonstrate that programming the simplest problems can cause havoc if you don't know what you're doing. I have added a section on Climategate that examines one of the programs that was causing Ian (Harry) Harris difficulties.
http://www.scribd.com/doc/26135665/Two-Simple-Statistical-Calculations-and-ClimateGate
Note that the first problem in these notes (Sea Surface Heights) is a `climate-change' problem.
Typical 'sceptical' site. Thanks!
You quote mathematics, logic and old-science arguments as a defence against an overwhelming consensus that supports
multi-governmental findings! Have you no shame?
You'll be telling me next that wind-farms, in UK winters, don't work!
Bishop, if you really want to convince me that you're right, then do me a favour. Put your thoughts into print, into the public domain Only then, I may start to take you seriously!
Oh bugger, you have! :)
Derek,
You are showing what I once studied at Cornell as "computability". It became Numerical Analysis later, but the problems are the same. People just don't understand that computers are limited.
I will have to chuckle a bit on your first example on page two. I looked at it and laughed and laughed because the first thing I noticed is you did not have "sum = 0.0" before the two loops. This was one of the most common errors I found in my days of fixing bad programs -- uninitialized accumulators. I assume that it actually is in the program, but I would add it to the example, if nothing else but to keep a old retired curmudgeon happy. (me).
I would like to read your paper when it is finished. It really is quite good and I think anyone who wants to play god with the weather really needs to read it. It shows many of the pitfalls you can trip over in using a computer.
Nice work. I would suggest you put something in there about the fact many hardware floating implementations of IEEE floating point merely pay lip service to it in storage of information, but not actual calculations. One issue is some are binary and others are actually hexidecimal, which can eat up as many as three bits in the mantissa. I use to work on an IBM 360 know what hell they can play.
How long would you give Jones et al in The Dragon's Den?
As I have worked in both Geophysics and Software Engineering, I would say that we end up in an iteratively improving profession. This is largely a results based thing - if you don't perform you either lose work or get paid less. In software dev, it is generally in our interests to share code with the community. We all learn a lot quicker and become a lot more efficicient in our disciplines. I think this is a good thing and I think this is why a lot of people enjoy software development. You get to create good stuff, and share in a community.
Geophysics in the Oil Industry is a little different in that there is a proprietry aspect to the data which is a result of competitive interests. However, there is still a big quality driver in the system.
However, where is this quality feedback loop in the Climate Science department? If we have an industry that is destined to find problems rather than solutions then I find it difficult to see how the industry can iteratively improve its product. It seems that the feedback comes from the IPCC mainly, who pat you on the back if you find data that fits its desired outcome.
How can this ever create a good product?
So, in summary, I think a lot of the issues around quality come down to business process.
Open sourcing the whole thng sounds like an obvious soultion, but then would we ever find that elusive hockeystick?
Possibly, but only slightly, off-topic, but might give "Don Pablo", "Derek" and others a wry smile...
One of the craziest bits of code I've ever had to fix... Program worked OK until a minor mod was made to it at which point it went beserk. Initial examinination of the code showed that the mod was perfectly correct. Much more detailed examination found the problem - the lunatic who'd written it originally had been terribly "clever" and had discovered that the compiler would allow negative subscripts and had taken "advantage" of this facility to stack two arrays on top of each other in memory and subscripted down into one and, by letting his subscript go negative, up into the other. The poor sod who'd written the mod was not aware of this (neither was anyone else!) and had inserted a couple of new variable definitions between the two arrays. Net result was effectively to shift the negatively subscripted accesses by a few bytes. Result, chaos. Quick-and-dirty fix, move the new variables out of the space between the two arrays. Proper fix, rewrite the program properly (I'd have added, sack the original programmer, but he'd moved on years ago).
Ah, what fun we used to have. Nowadays you have to work for a climate research outfit to get away with things like that! :-)
Numerical Analysis still gives me nightmares about learning Z. I still think that was an attempt by mathmaticians to hijack programming. Ok, there were good reasons for it, but when I was doing it, there were no Z compilers so plenty of scope for error between Z proof and implementation.
Coding is not just a problem with academia though. I've seen plenty of examples in business where enthusiastic amateurs end up hacking together business critical code with the inevitable maintenance and data integrity issues that follow. In telecoms, the most common application is Exel, and used in ways it probably shouldn't be. But people know Exel. MS includes basic programming tools so it lets people become dangerous. Inventory systems end up in spreadsheets, not databases. I've done the same myself, but force myself to use stuff like that to mock up or prototype what I want, then get programmers to do it properly for me so it's documented, scaleable, supportable and optimised.
Trouble is many businesses and no doubt academia sees IT often as an overhead, so resources are limited. Getting things done properly takes time and money, so typically the enthusiastic amateur's code ends up embedded in business processes until some crisis forces change.
Climatology's probably reached this point. Both GISS and CRU code is nasty. It's not been written by professional programmers, but grown organically hence Harry's challenges and frustrations at CRU. Lack of data manangement compounds the situation, and given the social and financial consequences of their output, this should be unacceptable.
I like the idea of open sourcing. Free the data, create a 'climatology toolbox' of reusable, optimised code modules. People that want to build their own models should then be able to plug & play with trusted content and focus more on their core research rather than hacking stuff together in languages they're not familiar with, or not necessarily the best tools for the job. If they find hockey sticks, great, auditing the code that produced them should be quicker and simpler to determine if the output was generated in error, or it's a real phenomena.
Andy,
Yes indeed. With commercial and engineering software, there's an overall feedback loop. If the software doesn't work as judged against reality, financial penalties follow swiftly.
emckeng said:
To a bog standard member of the public it looks to me like They have been conducting bogus meta-analysis using computer software as a cover.
Pogo --
Yeap, seen that, particularly in the old days when machines were small (the IBM 360/65 I used at Cornell had 1 million bytes of memory -- 1/4 meg of "fast" memory and 3/4 meg of Memorex (I think) "bulk" memory. The system used the latter for a swap buffer. As I type, I am sitting next to a dual processor 64bit system with 8 gigabytes of DIMM, and two terabytes of fast hard disk -- I do Photoshop for fun and some profit. And for fun I run an IBM 370 emulator on my desktop complete with MVS Hasp, etc. etc. So things had really changed. The 370 emulator runs several times faster than the real machine ever did and the 370 cost many millions of dollars back then. My PC cost about 10 seconds on run time on that beast. )
In any case, the largest partition was a quarter meg, so people did "clever" things like what you described. And when you used overlaying (this was before there was virtual memory) you had to be careful where you left stuff because if it got overlayed, the computer didn't know if it was the sums you left there earlier or the code from some subroutine. Some times the code looked enough like valid floating point numbers to run to completion giving interesting results.
Oh, those were the days -- THANK GOD FOR INTERACTIVE DEBUGGERS!
OH, by the way -- I highly recommend Derek O'Connor's paper that he gave a URL to above. There is a really nicely done analysis of what Harry did wrong. I also highly recommend it to Frank O'Dwyer to read so he can understand why so many of us think that the work done by the AGW crowd is BS.
Don Pablo :
As a "old retired curmudgeon" myself, I agree that I should have included the initialization "sum = 0.0" It is there now.
I'm curious about your comment on "computability" becoming Numerical Analysis. Are you sure?
All(?) floating point processors adhere to the IEEE Floating Point Standard 754-1985 and Intel, AMD, etc., spend a lot of time, money, and effort getting the basic operations such as 1/x, sqrt(x), sin(x), correct. (Intel nearly lost $450 M in 1995 because of a bug in its 1/x algorithm). Unfortunately, software sellers are not as careful and do not adhere to the IEEE standard. Remember, most compiler writers are computer scientists who are not interested in numerical niceties.
Do you think it is wise, at your age, to be running an "IBM 370 emulator on my desktop complete with MVS Hasp, etc. etc."? I still have nightmares about reading octal core dumps from a CDC 6600. I never did get the hang of it.
Thanks for your kind words on the (incomplete) notes that I posted above. I hope to complete them in the coming week.
Andy:
You ask, "However, where is this quality feedback loop in the Climate Science department?"
Don't worry! In about 90 years, we can evaluate how the centennial predictions used in th 2001 IPCC report did, and make some necessary corrections. In about 900 more years, we may even have a statistically significant set of these predictions so we can make some more confident corrections.
Patience, patience...
Frank wrote:
""Many climate scientists have refused to publish their computer programs."
Scientists such as Roy Spencer? Shall you submit the FOI or shall I?"
Spencer is on my list of people to request code from. Let me tell you how I proceed.
The central piece of evidence in support of global warming is the instrument record constructed by CRU from 1850 to present. Like all good engineers I want to start with
the longest, best, most compelling piece of data. Surely given its importance and centrality since the 1980s, the underlying work must be robust. So let's just check that and then move
on to the more complicated data. Surely, the climate scientists will have everything in order in the most important set of observations. I request that data and the code. They refuse.
They actually make up false excuses. They get busted. So, while Dr. Spencer is on my list, first things first. Lets start with the easiest task. Creating a database of thermometers.
Doing some quality control. and creating a spatial average with realistic confidence intervals. This project is a graduate student level effort. Does not even require a PHd. How do I know that? Simple, its in the climategate files. When the scientists were planning to do the next version, they suggested it was a good job for a graduate student.
When we have finished looking the the Global temperature index from top to bottom, it would a great time to push to have Dr. Spencer release his code. Currently there are two organizations that measure temperature from Satillites. UAH and RSS. Those two organizations do exchange information with each other to reconcile differences. In the past in fact, UAH has work closely with RSS and RSS were able to find a bug/error in UAH's approach. To be sure this cooperation between two competing groups give me some trust in the results, but full trust will require a code release. My guess is that RSS is never going to release code. Commercial interests. UAH, who knows, but if UAH does release code while RSS does not, then it would seem that everyone should use UAH. If neither release the code, then their results do not count as science in my book. So please press for release of UAH or RSS code. If they dont release it then people with my principles will simply not consider their results. Next.
In response to the complaint that many scientists have not published their code, Frank responded:
" many [climate scientists] have not [refused to publish their computer programs]"
Does this count as an argument in your mind Frank? I mean seriously. If there are 100 climate scientists and 98 freely release the code and data while 2 do not, on what logical planet in this universe does it make sense to point at the 98 when a complaint is lodged against the 2? On what planet? or are you taking confusium pills again?
How does your argument even work beside against itself?. I say "look most climate scientists share their data and code. Look, the most costly code (those million line GCMs) and the most costly data ( the GCM results) are shared freely, BUT these two scientists, Jones and Mann, wont share their code and data". Your response to this complaint is to point out a fact that no one disputes? Of course some scientists share their data and code. THAT IS WHAT makes jones and mann's refusal so suspicious. Do you specialize in self annihilating arguments?
Derek:
I enjoyed your (evolving) notes on numeric issues. I cover some of these issues briefly in a course for non-specialists that I am sometimes asked to teach (Computer Controlled Machinery, primarily for mechanical engineering undergrads).
I always start my explanation of floating-point representations with (decimal) scientific notation, which they are all already familiar with, then introduce computer floating-point representation as the binary version of this. This seems to get everyone up to speed quickly.
I also do most of my examples of how you can get in trouble using decimal scientific notation. One of my first examples is something like (1.234 x 10^6) + (5.678 x 10^4), showing how you lose the "78" and you can't get it back.
In general, I find that starting with an intuitive explanation lays the groundwork for understanding the detailed analysis that follows.
One whinge about software not strictly related to climate computing, but just a general comment on modern software developers and why supposedly trained software developers just can't get it right.
For a number of years recently I had the icky job of hiring new grads to do software development. They were generally honours graduates and would often have a dual degree, computer science/electronic engineering. They were all highly skilled in java development.
I kid you not. Out of 10 of these graduates, I think only one didn't come up to sort out a problem they found in their code. It went somewhat like:
Them: This code doesn't work at this place. Points to some loop statement 0 to 255
Me: You're using a byte sized loop variable
Them: So?
Me: In java a byte is a signed value with a range -128..127.
Them: oh.
Now I find it incredible that the stuff I learned in comp-sci 101 just doesn't get taught anymore. There is an almost total lack of appreciation of the underlying computer limitations and capabilities. Things like numerical accuracy, data type range, roundoff, appropriate use of different data types are just not known. I constantly see algorithms using the wrong data types. e.g. Floating point used when it is inherently less accurate and slower than the correct integer implementation.
If modern 'trained' software developers don't know how to do it properly, how can any half-assed climate scientist be expected to do it any better?
- As another aside, none of my Electronic Engineering honours graduates could explain what the transistor 'Beta' value meant. They were probably taught it once, but due to the new 2-week learn/test block method they could simply learn it, regurgitate it, and forget it in preparation for the next block.
my family needed to fill out AWS Form N-1 recently and found an online service that hosts a searchable forms database . If others are searching for AWS Form N-1 too , here's a
http://goo.gl/BzL2y1