Buy

Books
Click images for more details

Twitter
Support

 

Recent comments
Recent posts
Links

A few sites I've stumbled across recently....

Powered by Squarespace
« AskDrMann | Main | The Texas textbook massacre »
Wednesday
Sep172014

The OAS and replicability

The news that there is a new learned society for atmospheric scientists is very exciting and I'm sure that everyone at BH wishes those behind the move every success.

The focus is inevitably going to be on the Open Atmospheric Society "throwing down the gauntlet to the AMS and AGU" angle, but I'm also struck by the "throwing down the gauntlet to scholarly publishers" angle, summed up in this important position statement by the OAS regarding its journal:

[There is a] unique and important requirement placed up-front for any paper submitted; it must be replicable, with all data, software, formulas, and methods submitted with the paper. Without those elements, the paper will be rejected.

In the wake of Climategate I tried to interest the Committee on Publication Ethics, the umbrella body for scholarly journals, in the issue of replicability, but after two years of them procrastinating I took the hint and dropped the subject.

My guess at the time was that no learned journal wanted to be the first to demand data and code up front, frightened of scaring away potential authors. What, then, will be the impact of the Journal of the OAS? Will mainstream climatologists simply refuse to go near it? Will JOAS wither and die for lack of papers? Or will people start to look at those who submit to the traditional publishers and ask what it is they have to hide?

It's going to be fascinating.

PrintView Printer Friendly Version

Reader Comments (84)

A great development! Good luck to these brave souls.

Sep 17, 2014 at 9:08 AM | Unregistered CommenterAnthony Hanwell

It *will* be fascinating.

Many, many scientists (including me and others in my organsiation) subscribe to the ideal of complete replicability. For example, we've used things like version control here for a very long time.

Of course, It's only so useful that I give you all my code, if you don't have a supercomputer to run it on.... This kind of problem occurs at smaller scales too - what if the code I give you only works on commercial software? Should my work be rejected? Will the editorial team be keeping archives of all versions of the major open-source programming languages, to check that the code runs?

In my understanding, it is this kind of fiddly detail, rather than a lack of will, which means that most journals don't demand full code/data etc. Many already support it though (check out GMD, for example).

I will be looking forward to seeing how these kinds of issues are addressed in this new journal.

Sep 17, 2014 at 9:14 AM | Unregistered CommenterDoug McNeall

This is a brilliant initiative and I think I'm likely to sign up as an associate.

Now what we need is a similar move in the humanities / social sciences as a counterweight to the burgeoning 'climate communication' and 'nudge' activist inspired "research".

Sep 17, 2014 at 9:35 AM | Unregistered CommenterKatabasis

I echo some of Doug's concerns. I too support availability of code BUT this can be impractical especially in a publication.

I come from a GLP (Good Laboratory Practice) background. In my Laboratory all the raw data was archived for posterity, that includes everything balance calibration records as well as HPLC and GC-MS outputs. In theory it is possible to reconstruct every single experimental study undertaken since 1980. However, in practice there are technical problems.In the 1980s the majority of data recording was handwritten on paper and this data is still easily accessible 30 years after it was recorded. As time passed more electronic data recording and manipulation occurred so that today little data is handwritten. It is not just the software that has changed since 1980, we do make some attempt to preserve programme details, but the basic hardware has been transformed. We have lots of data stored on 5.25" floppy discs, data backed up on PDP/11 hard drives etc but no longer have all the relevant equipment to read it.....

Sep 17, 2014 at 9:45 AM | Unregistered CommenterArthur Dent

DM says that he (the MO) practise version control - well isn't that just bau in any SW development house? Does he think it's exceptional? Be a shame if it was.

As to the need to have all the code - even though third-parties won't have access to super-computers: it really goes to the point of whether the coders can be trusted. HARRY_READ_ME showed us that code gets changed by so many, so often that in the absence of good and trusted version control someone needs to keep a log of the actual code run, for comparison purposes, just in case. (BTW: was 'Harry's' code subject to version control at UEA?)

Sep 17, 2014 at 10:01 AM | Registered CommenterHarry Passfield

Arthur - that is no excuse at all - a backup policy that neglects to maintain hardware environment, or migrate data onto new media should hardware become obsolete is not a backup policy.

Doug makes some good points. but I don't think running the code is necessarily the major issue, just the ability to see the algorithms, which can be verified to be actually doing what they are described to be doing. (I've worked in very large IT projects and you may or may not be surprised about how minor errors, have impacts)

Sep 17, 2014 at 10:01 AM | Unregistered CommenterBarry Woods

If only the new society were the Atmospheric and Oceanic Society.

Since the mass of the oceans is several hundred times the mass of the atmosphere, how can atmospheric scientists alone assess the oceanic impact on climate?

Sep 17, 2014 at 10:29 AM | Unregistered CommenterFrederick Colbourne

Great news. It is a sad indictment of the stage that mankind has reached, thanks to pseudo-science in general and climate science in particular, that the basic tenets of the Enlightenment have to be re-stated 400 years down the line.

Sep 17, 2014 at 10:37 AM | Unregistered CommenterSteve Jones

I hope the IOP is watching and perhaps it will regain some balance/objectivity which it appears to have lost.

The archiving of code with a publication is very important whether you can run it or not. A walkthrough of the code can reveal how well it is constructed, how well it has been mantained and whether the science has been implemented. This can determine how much confidence one has in the results from the modelling.

Given how poor climate "science" develops its code, I am not surprised people are reluctant to publish it.

Sep 17, 2014 at 10:49 AM | Unregistered CommenterCharmingQuark

@Frederick The sun, biosphere and world economies may also have large effects - but appending such factors would make the name of this new society rather unwieldy.

Sep 17, 2014 at 11:01 AM | Unregistered CommenterPete

The phrase 'Trojan horse' in this article sums climate activism up nicely:

http://thehill.com/policy/energy-environment/217866-jindal-presents-energy-platform-to-counter-obama

Sep 17, 2014 at 11:15 AM | Unregistered CommenterSteve Jones

An interesting application of the theory of lemons perhaps arising from the economics of information?

The more questionable papers and research will gravitate to those journals with more lax standards.

Sep 17, 2014 at 11:18 AM | Unregistered CommenterGeckko

Looks good to me.

The data must be available as must the methods of collection, and the repeatability of such experiments. Repeatability is important in science.

Sep 17, 2014 at 11:48 AM | Unregistered CommenterJohn Marshall

All those requirements on authors should be standard practice for any reputable university or research organisation, it certainly is in engineering companies that have to cater for the sudden departure of key people.

Sep 17, 2014 at 12:55 PM | Unregistered CommenterMikky

I agree with others above. You don't need a supercomputer. What you need is access to all the data and the coding.

Sep 17, 2014 at 1:08 PM | Registered CommenterPhillip Bratby

And in the meantime, Leonardo diCaprio gets a Climate Change role at the UN....

(Sigh....)

Sep 17, 2014 at 1:22 PM | Unregistered Commentersherlock1

I suppose that analysing someone else's GCM code would be a gargantuan task, with or without a supercomputer to play with, but it ought to be manageable in parts. If the software is properly documented and intelligibly structured, I would also suppose that chunks of it could be identified and examined for their integrity and for their implications for a particular investigation by subject matter experts and supportive software experts. If it is not properly documented and structured, and given the complexity, should it be used at all for published works, be they scientific papers or 'merely' prognostications about climate variation?

Sep 17, 2014 at 1:29 PM | Registered CommenterJohn Shade

But back to the new society, the OAS. This strikes me as a very healthy and positive development in reaction to the caving-in of the leadership of so many existing journals and scientific institutions, such as the Royal Society, to the politically orchestrated campaign to demonise CO2. They were tested, and they have failed both science and society in terms of upholding high standards, challenging dogma, and being calm in the face of scaremongering.

Sep 17, 2014 at 1:34 PM | Registered CommenterJohn Shade

How do we know that the OAS is, and will remain, politically independent? is that not the problem with all the other learned societies? I guess only time will tell.

Sep 17, 2014 at 1:48 PM | Unregistered CommenterH2O: the miracle molecule

This is a bit of a wicked problem. I certainly agree with transparency in methods and data. Any decent paper should make these clear anyway, but publishing code and data will aid reproducibility. But there are limits on how much the publisher needs to aid the enquirer (eg buying them a supercomputer).

I used to (now retired) require that my PhD students would make their code accessible, but not that they would provide a limitless IT support service.

Perhaps there needs to be some proportionality in this, akin to IT security risk analysis. For papers like most of mine over the years, which have had limited impact, nobody is going to care very much. But for papers that have high impact, scientifically but especially politically, there needs to be higher standards of transparency and accountability.

How to realise this and give guidance to an Editor? Well a call has to be made regarding the level of politicisation of a paper. If the forces of political unreason are circling (regardless of from which direction) then a paper needs to be made more accountable.

Just some thoughts - not sure how to solve this wicked problem..

Cheers,
Lazlo

Sep 17, 2014 at 1:55 PM | Unregistered CommenterLazlo

Frederick Colbourne laments:

"If only the new society were the Atmospheric and Oceanic Society.

Since the mass of the oceans is several hundred times the mass of the atmosphere, how can atmospheric scientists alone assess the oceanic impact on climate?"

From the main page, first paragraph: http://theoas.org/

"The OAS is an international membership society for the purpose of studying, discussing, and publishing about topics in atmospheric related earth sciences, including but not limited to meteorology, hydrology, oceanography, and climatology. It is open to anyone with an interest at the associate level, but student and full memberships also are offered."

If only Frederick had read up on the society before suggesting a name change.

Sep 17, 2014 at 1:58 PM | Unregistered CommenterAnthony Watts

Excellent initiative from Anthony and friends. Doug McNeall makes some key points (some of which he will probably remember I've raised before). Replication isn't a simple nut to crack, especially when one gets to the GCMs. But the world and its policy makers deserve the state of the art to be pushed much further in this direction.

Sep 17, 2014 at 2:20 PM | Registered CommenterRichard Drake

Lazlo writes:

"Just some thoughts - not sure how to solve this wicked problem.."

The solution is to archive it, we intend to do. While peer review by its very nature may be imperfect, by archiving it and making it available, two things are accomplished.

1. The author(s) stand behind the work strongly by making the replication process and elements transparent
2. Science is self-correcting. If the replication elements have a bug, a faulty premise, or other issues, they will most certainly be eventually called out elsewhere and corrected.

So many papers basically amount to an unsupported opinion that can't be easily challenged because the elements of replication have never been archived.

Sep 17, 2014 at 2:23 PM | Unregistered CommenterAnthony Watts

As has been earlier noted, Doug McNeall's supercomputer objection is not central. Yes, it can be helpful to re-run the exact software on the exact data, but that's only a very limited form of replicability. It will detect outright fraud (e.g. inventing data), which has been a problem in other scientific areas but doesn't appear to exist in climate science. The greater advantage lies with presenting all source data, and providing visibility into the *exact* processing steps.

We've seen, e.g. in temperature reconstruction studies, that the input data can get adjusted without any mention of it in the paper's "Methods" section. Does it make a difference? -- maybe it does, maybe it doesn't. But only provision of the precise data sets used makes it possible to answer that question. Transcription errors can also get introduced.

With respect to the algorithms, while the "Methods" section is typically pretty decent for giving the gist of the processing steps, there's often quite a bit left unexpressed there. E.g. if one writes that a curve has been filtered to remove low-frequency elements -- what type of filter? what length? how are edge effects handled? The only way to find out, has been to ask the authors. Not to mention steps which are mis-coded, or are omitted entirely in the description. [Neither of those errors need be deliberate, by the way. Mistakes happen.]

As to version control, good on the MetOffice. But as has also been mentioned earlier, this is de rigueur at the point when one realizes that software is a key product of an organization, if not earlier. Once a program progresses from simple data handling and becomes a substantial element of a paper, it ought to be archived, preferably in a standard software configuration management system. [Unfortunately, CMSs, like computer languages and hardware, fall out of use, so this only addresses the issue over the short-term.] If one is following a reasonable software development process, the steps of providing source code and data should be fairly painless. [There might be an exception if the source data is monstrous, say terabytes or beyond in size.]

Sep 17, 2014 at 2:38 PM | Registered CommenterHaroldW

HaroldW: My main emphasis on GCMs has been something different than basic replicability - the practical ability to change some parameters, starting conditions or detail of some part of the algorithm in which you gain an interest and see what happens. Earlier in the month Julia Slingo talked about the nine months it takes to run the Met Office Unified Model (GCM) at 1km grid resolution on the most powerful hardware they have. (Starting 1850? Running till 2100? Doug may be able to fill in the blanks there.) So if you make that little change you fancy you have more than a little wait to see the result! For me that's not open in an important sense. And yet the results are published with great fanfare by the IPCC as a guide for policy makers.

I agree that SCMs (source code management to your code management systems?) have had a history of going out of favour but I think this will stop, to a large degree, with the widespread adoption of Git, in the last few years, aided by GitHub, for many different kinds of open source projects. (I'm not saying Git is the best but that the network effect is already in effect. VHS v Betamax and all that.) So Git will be around in forty years in good working order and older repos will automatically be converted to any new formats required by future versions as part of that. That's my prediction, having watched this area for quite a while. It's a relief!

Sep 17, 2014 at 2:54 PM | Registered CommenterRichard Drake

Although I'm glad the OAS has been started up I'm doubtful it will have any impact on the 97% of climate scientists who are not skeptical. I'm fairly certain that the huge majority of climate related scientists will avoid this publication like the plague in order to avoid giving it any legitimacy. It will end up publishing submissions from a small subset of skeptics which will allow the concensus scientists to characterize the Journal as out of step with reality.
Is it really likely that any funded scientist will risk his acceptance in the majority by stepping out to publish in this Journal? I doubt it but even so I can be hopeful.

Sep 17, 2014 at 2:56 PM | Unregistered Commenterwill

There are solutions for some of the problems mentioned, or at least ways to cut them down a bit.

Standards in computing environment are annoying, but can be worth their cost. If we have 10 standard-ish environments at a time instead of 100, the problems of being able to read and run programs over time become much more tractable. Also the ability to read them over space, in different labs and offices and even homes.

A basic discipline is to make sure that the code supplied is the same version of the code that was used to produce/process the data supplied. That takes time, but it's worth it.

As time goes by, some very worthwhile practices and considerations will be driven home by this process, becoming engrained in the people. Write your code in such a way that people can read it, for instance, instead of merely stringing stuff together.

Oh, this exercise should be highly productive, over time. Sharing methods at the level of working code with all the steps spelled out, instead of hand-waving explanations that always leave things out because time and attention are limited. No excuses, because the code is the real thing. So is the data.

Sep 17, 2014 at 2:58 PM | Unregistered CommenterRoberto

As someone who urged replicability in paleoclimate studies, I was very frustrated whenever someone talked about "supercomputers", as this was inevitably put forward as an excuse for not providing very small datsets: paleoclimate data was trivial in size and programs were small.

Secondly, one of the purposes of archiving code (as discussed in econometric replication discussions) is to show methodological decisions that the authors did not discuss in their methodological section. Sometimes, these are relevant to diagnosis even if you aren't running the program.

Thirdly, when programs involve supercomputers, I presume that detailed parsing of the papers will be of primary interest to other specialists running supercomputers.

Sep 17, 2014 at 3:22 PM | Unregistered CommenterSteve McIntyre

Richard Drake (2:54 PM)--
I agree that if a program (GCM) really takes 9 months on a super-computer, that replication and parameter sensitivity studies are not practical. The implication should be to increase the dosage of salt which accompanies said results.

[And Doug McNeall, if you fill in Richard's blanks, can you also indicate whether the 9 months figure is for a single run? My recollection is that the current practice is to make multiple runs with slightly different initializations. Perhaps that doesn't apply to these very long scenarios.]

Sep 17, 2014 at 3:39 PM | Registered CommenterHaroldW

Steve Mc:

As someone who urged replicability in paleoclimate studies, I was very frustrated whenever someone talked about "supercomputers", as this was inevitably put forward as an excuse for not providing very small datsets: paleoclimate data was trivial in size and programs were small.

Yes, I meant to say that. From the non-sublime (I doubt GCMs will turn out to be) to the ridiculous.

Sep 17, 2014 at 4:02 PM | Registered CommenterRichard Drake

Harold: The first thing that needs to be open is honest debate of these issues and their implications for 'evidence-based' policy making. Until then, yes, pass the salt.

Sep 17, 2014 at 4:06 PM | Registered CommenterRichard Drake

"I suppose that analysing someone else's GCM code would be a gargantuan task"

I do not agree. I have walked through GISS ModelE and was alarmed by what I saw. Very old style FORTRAN, poorly commented (I reckon the comments are a lot less than 10% of the code, modern professional software would have well over 50%), legacy FORTRAN code presumably used when they had card readers, cloud modules that look very ropey, etc.

I certainly would not rely on anything produced by it.

Sep 17, 2014 at 4:49 PM | Unregistered CommenterCharmingQuark

Steve Mc:

Secondly, one of the purposes of archiving code (as discussed in econometric replication discussions) is to show methodological decisions that the authors did not discuss in their methodological section. Sometimes, these are relevant to diagnosis even if you aren't running the program.

I agree. Often climate scientist use IDL (ugh!), or some other expensive proprietary language. But in most cases the principal use of the code is to understand what exactly the authors have done and what decisions they have made, not to check that running the code generates a replica of the study's results. No software support is required, although any ancillary data files that the code reads are needed as well as the main data. In my experience, it is rare for a methods section, even when supplemented by further details in supporting material, to provide full details of all the key steps and decisions involved.

Sep 17, 2014 at 4:52 PM | Unregistered CommenterNic Lewis

How often is 20 year old work up for replication or re-run? I ask after resurrecting a Sun system stored in 94. The power supply on the SPARCstation IPX was dead. I replaced it with a SPARC 10, and was able to make the original applications run by jimmying the hostid to match the older machine. All of the peripherals worked, drives, cartridge tape drive, DAT tape drive, but alas, all but one of the QIC cartridge tapes had failed drive bands. I was able to move the one good band from tape to tape and extract most of the vital data. The 8mm DAT tapes all worked fine - no problem with any of them.

The hard part was lost passwords for proprietary software even from publishers still in business. Two of them had no records of our licenses and the password generators they had used were long gone. In one case the password access algorithm was buried deep in a start-up script and was susceptible to discovery, in the other no hope.

So again, who would want to get deeply into 20 year old code?

Sep 17, 2014 at 5:21 PM | Unregistered Commenterjferguson

It cannot but be helpful to have all the code - IDL, Fortran or whatever other horrors lurk within. (There always are horrors. Not just in climate science. When Dijkstra coined the term The Humble Programmer he did something necessary as well as good!)

The code - all the code, including any scripts used to run it - should be provided, with the full history of changes using a SCM (which should therefore show which script was used to run which version of the code - none of the Harry_read_me.txt-type guessing game). That's the absolute minimum to be expected. Then there's data. Nic and Steve have invaluable experience in the nitty-gritty of that, in very different areas of the climate scene. But none of us surely disagree about full disclosure of code and the benefits it can bring, even when it's neither practical nor possible to run, let alone replicate.

Sep 17, 2014 at 5:24 PM | Registered CommenterRichard Drake

jferguson:

So again, who would want to get deeply into 20 year old code?

Depends if it's dead or not. Mostly it is. One of my favourite blog posts of the last ten years is The World’s Oldest Source Code Repositories by Robin Luckey in August 2007, which begins

Most software doesn’t survive very long. The hard truth is that more than 80% of the open source software being written today will be forgotten in a few years.

For those projects that do succeed and thrive, the developers typically decide at some point that they need a new source control system. For many reasons (lack of time, lack of tools, or a simple desire to start fresh), most projects simply throw away their development history at this point and start again.

All of which means that most source control repositories are lucky to survive more than a couple of years.

However, there’s a class of meticulous, responsible, (obsessed?) programmer that somehow manages to keep the same thread of development alive and unbroken for decades.

Can you guess which the three oldest projects are, without looking? And are today's programmers of GCMs sufficiently meticulous, responsible and obsessed? :) Whatever our answer to that, there's no question that their programs are alive in the sense meant above.

Sep 17, 2014 at 5:34 PM | Registered CommenterRichard Drake

Re. Doug's comments. 'Revision Control' - just means that the authors manage their versions it does not mean that the public that pay for the research can see the details of the calculation or input data. So this comment is a diversion. The supercomputer comment is also irrelevant. Anyone that wants to understand a method will only need to read the code or run for a small amount of time and will need nothing like a supercomputer to do that. Additionally, in fifteen years time today's supercomputers will be equalled by desktop machines, so if you want to improve the science of the field, then make all the code public. You will likely find all sorts of advances in understanding occur as a result. E.g. someone might find a way to make a prediction which was not already coded into the input of a GCM - which would be a revolution. If you stand by your comment - release your code that ran on the Cray-2 (which is equivalent to a modern iPad): http://apple.slashdot.org/story/12/09/17/203232/apple-ipad-2-as-fast-as-the-cray-2-supercomputer

And if revision control has been used for a 'very long time' - why is it so very difficult for people to determine how raw temperature data 'adjustments' have been made?

Sep 17, 2014 at 5:39 PM | Unregistered CommenterZT

ZT: Temperature data is another thing again. The past management of that - both data and the code used to manipulate it - seems to have been abysmal. But Doug is not as far as I know responsible for any of that. And the BEST team seems to have cleaned up the process considerably in the direction of complete openness, even if Judy Curry and others may have some questions about how they've chosen to clean the data! Much more Anthony's area of course.

Sep 17, 2014 at 5:45 PM | Registered CommenterRichard Drake

A step toward providing an exemplary forum.
But until Nature Climate Changes forces a corrigendum to Oleary's Figure 2 on Western Austrlia sudden sea level rise at 119millenia BP, and until Science retracts Marcott's abomination with a 20th century hockey stick manufactured by grossly changing previously published proxy dates (about as clear a case of scientific misconduct as there could be), progress will not have been made toward cutting out the cancer in climate science.
In both cases, I have posted the clear evidence to the journals and authors. Nothing in 18 months.
Both cases are exposed in detail in the new book, now at the publishers. Sanitizing sunlight about to shine.

Sep 17, 2014 at 5:58 PM | Unregistered CommenterRud Istvan

RD: Doug's comments were quite broad "Many, many scientists (including me and others in my organization) subscribe to the ideal of complete replicability. For example, we've used things like version control here for a very long time." hence I think it is entirely fair to point out the use of revision control in certain areas of climate science has been abysmal (as you mention). If BEST is 'in the direction of complete openness' rather than 'is completely open' both in data and code, then I guess BEST is not good enough. (I am not too familiar with how BEST is doing things - but the issue is really not too complicated: raw data (site histories, location of thermometers, temperatures as measured by the thermometer - in the raw form - images of log books, and transcribed temperatures, etc.) and processing algorithm should be public and be available for inspection. The same should be true of drafts of papers, IPCC reports, email correspondence, requests to delete incriminating messages, etc.

"Many, many scientists (including me and others in my organization) subscribe to the ideal of complete replicability."...and then proceeding to make excuses does not inspire confidence that climatology has any sense of reproducibility in science.

Sep 17, 2014 at 6:27 PM | Unregistered CommenterZT

Part of my day job is code review of financial systems. I'm currently working on delivering a big insurance application. 60+ C# projects and over 1500 stored procs. I can confirm that there is no need to run code to review it. In fact it is sometimes an advantage not being able to step through. You can blinded to the underlying code quality by the the dazzle of the running code ticking all the functional boxes.

It's very easy to identify coders styles too. I know who the kludgers and pros are ;)

Sep 17, 2014 at 6:28 PM | Unregistered Commenterclovis marcus

Riohard Drake,
I couldn't guess. FWIW another challenge of running 20 year old code, particularly in a UNIX environment is recreating the twenty year old UNIX environment. I had a lot of trouble deciding how far forward to patch the 93 4.1.3 OS. I finally built up two drives with one patched to mid '94 and the other up to 2000. It turned out not to matter so I ran with the 2000 patches.

I think if I'm comprehending some of the other comments, looking at the code might get the job done without actually running it. I suppose for the sorts of things I'd like to believe I'm sharp at, I could quickly divine whether the coder knew what he was doing, and if not assume that running it wouldn't change my mind.

Sep 17, 2014 at 6:49 PM | Unregistered Commenterjferguson

ZT: I haven't studied BEST's practices in detail, so my words included some wiggle room! But it would be unfair to take my ignorance as proof that 'BEST is not good enough' - good though the pun is. I have the strong impression from Mosher's writings that they have, as I already said, significantly cleaned up the process. Please delve in and come back with a fuller report if you suspect my judgment is faulty on that.

Sep 17, 2014 at 7:26 PM | Registered CommenterRichard Drake

clovis and others: Everyone agrees that you can learn a heck of a lot from reading code, if you have sufficient time to do so (meaning, in most cases, that someone is paying). One of the threads that most amused me on this recently was on Quora: How does interning at Twitter compare to Facebook or Google? The top answer, from November 2012, is from a guy who seems to have interned everywhere so you can't fault that. But what he wrote made me laugh. Here's one excerpt:

All three companies are great for this, but in different ways. Google is a good place to start learning strong principles of software engineering, but you have less responsibility for the code you write (compared to Facebook and Twitter) as it generally goes through a tedious reviewing process from which no doubt interns may learn a lot, but other people share some of the responsibility as well. Also, the development cycle used to be somewhat longer on the projects I was working on at Google and bugs could be caught before they reached production.

Isn't that terrible for the poor young hacker? Followed later by this:

Productivity: Google has by far the best developer tools to work with. On the other hand, their development cycle can be quite slow because of factors like readability reviews, reviews from randomly chosen SREs, etc. At Facebook and Twitter, getting code shipped is more important, so if you break something, you fix it, end of story.

Ah, readability reviews. Pain to the hotshot intern but they just clinched the vote for Google from this user. I always thought the influence of once-programmer-of-Lex and chairman of the board Eric Schmidt may have been profound. I at once gave him credit for this excellent piece of culture as I read it.

Which means I'm very much in agreement about open code being a necessary condition for climate goodness. But is it sufficient? Even granting your point that sometimes the ability to run code can divert the inexperienced (and I'd argue they must be inexperienced for this to happen) there are surely other times when a very complex piece of software cannot possibly be understood without the ability to change elements, run the code (or a subset) and see what happens. All of my experience screams that this is needed with very complex systems. So what are we to do about the nine month runtimes?

Sep 17, 2014 at 7:41 PM | Registered CommenterRichard Drake

Richard,
2 days was the longest runtime I had any exposure to. This was code to divine the meaning of the previous week's business and was run over the weekend with output in 'execuread' format first thing Monday. We put progress alerts in it to squawk anomalies as they could be detected so we could fix a run if at all possible. But 9 months? what happens if it bombs 7 months out? Have you worked on anything like this?

Sep 17, 2014 at 8:08 PM | Unregistered Commenterjferguson

I particularly like the fact that the Open Atmospheric Society is to be "cloud-based".

Come back Svensmark, all is forgiven!

Sep 17, 2014 at 8:13 PM | Unregistered Commentergraphicconception

When Mickey Mann published his groundbreaking Hokey Schtick, overturning decades of received wisdom in paleoclimate, did research groups around the world rush to replicate it? Of course not. We had to wait ~6 years for Steve McIntyre the retired mining engineer to fight tooth and nail for the data and do the work on his own time.

Every climatologist knows that all this replicability stuff is agitated for by 'deniers' as a means to undermine the cause by snapping hockey sticks and other crud before they make it to the front page of the next IPCC report.

The only research climatologists have ever shown any interest in replicating or criticising is stuff that goes against their 'noble cause'. Making code/data available isn't going to change the culture within the field or keep junk science out of the headlines, just make it slightly easier for bloggers to play the game of memetic whack-a-mole after the fact.

Sep 17, 2014 at 8:13 PM | Unregistered CommenterJake Haye

jf: In a word, no!

In slightly more words, I still have to record my notes on Julia Slingo's talk at the IoP and my reflections thereon. When I have done so this issue will I'm sure loom large. It is an extraordinary situation. But, as a programmer, I do share some of the excitement Dame Julia obviously felt. But no, who apart from the coders of GCMs have this kind of runtime to contend with.

Sep 17, 2014 at 8:15 PM | Registered CommenterRichard Drake

Richard Drake and jferguson

Re the 9 month is it actual runtime or an overall time frame?

I recall a discussion with Richard Betts when the MO's Decadal Forecast morphed into a 5 year forecast. The reason given being time restrictions on "the" supercomputer as it had to carry out requisite meteorological runs on a daily basis.

I have no relevant programming experience, just wonder if it is an overall gestation period rather than actual runtime?

Sep 17, 2014 at 8:55 PM | Registered CommenterGreen Sand

Haha, gestation period is very good.

I took it to mean elapsed time allowing for other shorter jobs being run on the same supercomputer on a predictable basis. But this was the kind of detail I hoped to get into - eventually - on the discussion thread on Slingo's excellent presentation at the IoP. I'd like a calm atmosphere when I try to do that, not something we always achieve in such areas!

Sep 17, 2014 at 9:35 PM | Registered CommenterRichard Drake

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>