Friday
Feb242012
by Bishop Hill
Show us yer code
John Graham-Cumming, Les Hatton and Darrell Ince are all well known around these here parts. John Graham Cumming was quite prominent after Climategate for his critiques of the standard of CRU's computer code. Darrell Ince wrote an article making similar criticisms. Les Hatton was mentioned as the author of an article criticising the IPCC's handling of hurricane data.
The three of them have just published a new paper in Nature, looking at the issue of code availability and trying to address the question of how we can move to a world in which academic computer code is routinely made available.
Reader Comments (19)
Years ago I went on a course with Les Hatton, and still remember his passion and enthusiasm for his subject matter (at the time, computational geophysics) .
Having worked as a software developer for many years since then, I cannot see any reason why source code and test data should not be published into the public domain, especially when it is influencing public policy.
don't know if this has been posted:
Grantham Research Institute: Climate change and the new industrial revolution|
Professor Lord Stern of Brentford delivered the Lionel Robbins Memorial Lectures 2012 on 21, 22 and 23 February. His presentation slides can be downloaded here|.
http://www2.lse.ac.uk/GranthamInstitute/Home.aspx
Press Association: Stern warns climate change critics
Lord Stern, whose key 2006 report set out the economic case for cutting greenhouse gases, said science suggested the world was heading for temperature rises of 3C or 4C on the basis of current efforts to stop global warming.
Such rises would cause a "catastrophic rewrite of the relationship between humans and the planet within the lifetimes of those being born today", he will say in the last in a series of lectures on climate change...
Policies which tackle emissions are addressing that failure and are pro-market, he will tell the audience at the Lionel Robbins Memorial Lectures.
And he will say: "Some also suggest that we cannot afford to take action to 'save the planet'. But low-carbon growth will be full of discovery and the costs of many green technologies are falling fast.
"Past industrial revolutions involved transformations that drove two or more decades of strong innovation and growth, with investment flowing to the pioneers."
http://www.google.com/hostednews/ukpress/article/ALeqM5g96cljnzpQlE16ZoOTE2OWtv1d3A?docId=N0691911329923701781A
how mad that the google Press Assoc page has a Huffington Post piece "Why Do Meteorologists Dismiss Climate Change Science?" directly under the above!
As someone involved in software development for 20 odd years, and who has seen the benefits of Open Source, I feel the paper, understandably, has ideas that deal with the reactive, rather than the proactive.
What do I mean? Well it places a burden on the "owners" of the code to document, justify ("outside" developers can be psychotically pedantic) and disseminate. When all they really want to do is get on with what they are paid for.
What would be a better a solution would be to introduce Open Source methods RIGHT at the beginning. If you are unhappy with the documentation, do not just whine, but improve it. If you like to use 4 spaces for a TAB indent rather than 8, fine, then you can win your argument first, then amend the 200,000 lines of source code.
It is the classic responsibility and authority are equally matched. One of the principle reasons Open Source flourishes. With this paper "outsiders" have lots of authority but no responsibility. No one likes those situations.
Whilst I appreciate the paper's constructive desire to improve the situation, it is still dealing with "walls" and how to look over them. Not with breaking them down.
The Team will never go for it
Dear Andrew,
Many thanks for mentioning our article in your blog. It's very unusual for any of my research to reach a wider audience (I've already had two emails about it; this is two more than for most of my papers).
It is worth putting it on the record that both referees and editorial staff at Nature were very supportive of our efforts and the article improved greatly from the refereeing comments that we received.
I shall fully retire in six months time and this work and my membership of the recent government's review of child care are the two things that I have done that I feel will make a difference.
Regards
Darrel Ince
This is a valuable paper which I am sending around to people I know across a variety of fields.
I hope this idea will become the standard for any field which could be significantly affected.
I look forward to reading Peter Gleick's considered critique (a la Framboise) as to why such reactionary ideas will only give the Well Funded Heartland Evil Denial Machine yet more ammunition to attack Hard Working Super Duper Ethical 100% Honest You Bet Your Life Climate Scientists with.
And I'd advise the authors to be careful who they send send stuff to. In particular be very suspicious of addresses like:
ofcourseimaboardmemeberyounastylittleoikgetonwithitsunshine@genius.gleick
Let me spend ten minutes reading your source code, and I will know you better from that than from reading your diary or journal from cover to cover. Though I regard myself as a scientist and inventor, I spend much of my time writing code. Originally, when I was young, nearly every scientist and aspiring engineer needed to know how to write code in order to harness the amazing computational power of the computer. If not, you were relegated to the computational backwaters of pencil and calculator. I actually managed to publish the source code of an entire program in an international journal 27 years ago (typeset and proofread). It accompanied another publication in the same journal that dealt with the mathematical and statistical qualities of the computation we were attempting. The reviewers were complimentary with one saying that the source code was instructional as to how to reduce a theoretic concept to executable code. The years have come and gone, and I have seen many programming languages, and many programming paradigms, come and go with them. To the uninitiated, source code, though it represents the mother-lode of ingenuity, is a huge pile of detritus, and even to those skilled in the art of writing and reading code, it is mostly ore of poor yield. There are a number of daunting challenges here and two of the most obvious are: 1. Stability: the languages code is written in continually evlove and backward-compatibility, the assurance that previously code will function correctly, is problematic (many times, the languages have disappeared entirely). 2. Suitability: in almost all cases, it is impossible to verify (to be certain that the code is error free) and validate (to be certain that the desired functionality is present) any computer code merely by the process of inspection.
Well it's not just source, but the data and the methodologies that allow you to repeat and verify the results.
Source by itself is of limited use.
As most of the people here know, computer software is an absolutely ideal place to hide cheating.
If the the automated source code obfuscators available to coders weren't bad enough, the mindset of many people involved in managing programming activities (Think NHS computing, Chinook FADEC ) should provide clear warning that there are serious inadequacies in the way much software is designed, built and tested - and that's without any deliberate element of misdirection.
I feel very strongly that code built at public expense should publish the source code under the provisions of FoI in the same way as other documents.
Where mission critical decision support is concerned, independent review should be the norm. There is much good work being done - but the typos, lack of arithmetic and logical rigour, plain foolishness, confirmation bias, unwarranted assumptions and plain dirty hacks that I've seen in workaday code lead to subtle and not so subtle errors ending up as "answers to the inputs".
I would urge anybody curious about this topic to go take a look at Les Hatton's work - it's important.
For a lay person like me, in regard to computer programmes, languages and development - this was a hugely interesting article, and so well written that I could actually 'get' the problems involved.
Thank you!
It would seem to me that this problem could be compared to a situation where some researchers hand in their work written in cuneiform - which someone else must translate into a printable document, while other use pictograms, runes or Chinese character. Obviously translation errors will creep in!
So might it not be feasible to demand that papers submitted to a publisher must use code from a freely accessible deposit?
After all, it isn't that long ago that publishers asked for papers to be submitted in electronically readable format, instead of in one-sided, double-spaced typewritten form.
@Jiminy Cricket - 7.08 a.m.
I support this very important point which needs to be included in any discussions with the powers that be.
Full pdf (4 pages) at:
http://www.nature.com/nature/journal/v482/n7386/pdf/nature10836.pdf
To Viv Evans,
A large part of the problem is that even if you can read the source code, even if the code is written up to any constructable standard, it is still largely unintelligible to even the most astute and weathered reader no matter what or how long the reader's experience might be. There may be very obvious errors: those can be easily caught. But the influence of even the most obvious error on the final result is almost always impossible to evaluate by mere inspection. I applaud the publishing of source code, but the publishing and sharing of source code, while it certainly implies a level of trust and trust-worthiness, does not ensure correcness, nor does it exclude the possibility of duplicity.
The publishing of source code does not ensure correctness, but it does add to tracability, and therefore accountability. A comment or two, with a date, where code is fixed would be even more illuminating!
@Pluck
I think you're dead right.
The issue of publishing the code is, I think - as you say = an indicator of trust.
It does not guarantee for instance that the code which acted on the data to produce the proffered results is indeed the published code for instance... something I've seen happen with "out of version synch" executables by accident.
There is the whole principle of being able to reproduce identical results from the same data. I do wonder at the Met Office's insistence that they'll get better answers with bigger, more expensive computers....
I have long felt that where "computer errors" are concerned - certainly in the commercial world - that software is nowadays pivotal in the commissioning of misdeeds and is, by and large - ignored by those whose job it is to police corporate behaviour. I am aware of a comprehensive campaign of fraudulent billing by a UK utility for instance - that never made it to the public space. Software makes these things all too easy to perpetrate and also reduces the risk to the perpetrators by minimising the numbers of folk with direct involvement.
Technically, I think the concept of public code repositories is a potential avenue worth investigating.
If I WANTED to hide something I could.
From using "libraries" which are copyrighted to "others".
I can generate code on the fly.
I can reference data in a database to which it would be impractical to disseminate.
Cheating ain't difficult.
This is all about trust. The only self-propelling trust model that has been proven to work is Open Source because you only need to work with people you trust.
This Blog is an example of Open Communication. BH needs us as much as we need him. This blog prospers through trust.
Darrell (if you are reading):
I have examined the more specific issue of code and metadata availability at Nature, very much arising from issues put forward here:
[1] The code of Nature: Making authors part with their programs, and
[2] Data availability and consequences in cancer and climate science, and
[3] Data availability in climate science: the case of Jones et al 1990 and Nature
Best
Just wait until they start linking Climate models to models of Global Economics!!!!
How many lines of code are there in the average GCM - a million, or at least a few hundred thousand I would guess.
As for the paper under discussion here, it is an excellent way forward, and who could argue against it? Well I think we know, with a high degree of certainty, the answer to that, as others have said more eloquently above.
I still remain more worried about Garbage in - Garbage out.