Wednesday
Aug042010
by Bishop Hill
JG-C on code availability
Aug 4, 2010 Climate: other Journals
John Graham-Cumming argues for the availability of scientific code.
Books
Click images for more details
A few sites I've stumbled across recently....
John Graham-Cumming argues for the availability of scientific code.
Reader Comments (10)
Scientific code developed with public money should be released (unless there are such things as security implications). I suspect the real reason they won't release code is because they would be massively embarassed by the state of the code.
Instead of code, what we need is clear math on how they calculated their results. Show us the math so that it can be independently replicated.
There is an old saying in programming. "Every program has at least one unnecessary line of code and every program has at least one bug left in it."
Having spent several years working in the Cornell University computer center as both a statistical consultant and a debugging guru, I can say that Phillip if absolutely right. There code is a god awful mess and full of stupid bugs. These people are basically graduate students dependent on making their professor happy. They are not professional coders in the least. I am not surprised when they make stupid statements like "we can't port it to another system." Only God knows what stupid tricks they polled in their efforts.
@David. The mathematics used for CRUTEM and HadCRUT is very clearly described in the Brohan et al. 1996 paper. You can obtain it from the Met Office here: hadobs.metoffice.com/hadcrut3/HadCRUT3_accepted.pdf
John Graham-Cumming
Writing down the math and writing the code are two VERY different things. I spent far too much time 40 years ago on "computability" (now known as Numerical Analysis, I believe) with regard to statistical calculations to know that having a mathematical formula and a working, properly designed and written and DEBUGGED
Show me the code. That is the only thing that matters.
Opps:
Add: "program are two very different issues." after DEBUGGED. I have no idea how I lost that bit. Sorry.
My background is in large Software Development programs for the US Department of Defense. My initial interest in AGW was related to the climate models. It did not take long to come to realize that most computer programs related to climate change were not developed with any 'standards'. There are no real design documents, code documents, test plans/procedures or any attempt at version control of the software or data. Its also become pretty clear that the organizations responsible for these programs have no real interest in maintaining the software in an acceptable mannar. The responsiblity for these programs should be taken over by an organization with a better understanding of software development. There is nothing wrong with hacking together programs when they have minimal impact - however, when the results are driving trillion $$ decisions... a little more rigour is required.
Is anyone aware of a site where the JG-C article may be found that is not behind a paywall?
Tks,
R Connelly
a little more rigour is required
Sadly, a gross understatement. However, you have outline the general short comings. My concerns are a bit different. That of understanding the computational limits of a computer. Computers do not have unlimited precision, they are limited to the number of digits in the mantissa of the floating point representation. In the case of 32 bit floating point numbers, it is a max of 8 digits. With 64 bits is is about 15. With 128, a bit more than 32. But when you are doing the sums of squares of numbers like -- 0.00000043 and 123998098.034 you soon run out of them. One must normalize your data upfront, otherwise it is garbage in, garbage out. That is true even today with 128 bit precision, and most programs I looked at use only 64 bits. Some actually used 32 bits, which is useless for all but the most carefully done code.
Dam few graduate students have any idea about what I am talking about. But it is what a large number of really bad theses are based on --- computational garbage. I saw this with Factor Analysis, and it is probably true of most Principal Component Analysis runs as well. They are basically the same code. In fact, most programs which will do one analysis will do the other as well.
As for code written by most typical graduate students, good luck on it being even close.
Coding is the domain of properly trained programmers, such as those described by R Connelly. Put it this way -- Would you have a graduate student do a heart transplant on yourself? If not, then why do you trust in the results of his program? If yes, you deserve what happens.
Remember that these climatologistes are true "cargo cult scientists" that Feynman warned about.
They believed that if they drew enough graphs, did enough simulations, wrote enough papers and held enough conferences then some simple "Climate Laws" would magic into view. Like Boyle's Law or Hooke's Law or Ohms Law.
Something really simple like (climate) = (CO2) * (pi ^2) * (c ^2)
This explains their dreadful coding and version control ideas. They never saw the programs as anything but an intermediate step - like the wooden headphones and bamboo control towers.
Like the original south sea cargo cultists they just don't get it. Some of the basics are missing. There is no global climate. The worldwide "average temperature" has no meaning. Just like the global average rainfall. Global average windspeed. It's tosh.
Pressing the RESET button on climatology will mean nothing because they are still on square one after so many years.