- Bishop Hill blog - The OAS and replicability

Wednesday

Sep172014

Bishop Hill

The OAS and replicability

Sep 17, 2014

Journals

The news that there is a new learned society for atmospheric scientists is very exciting and I'm sure that everyone at BH wishes those behind the move every success.

The focus is inevitably going to be on the Open Atmospheric Society "throwing down the gauntlet to the AMS and AGU" angle, but I'm also struck by the "throwing down the gauntlet to scholarly publishers" angle, summed up in this important position statement by the OAS regarding its journal:

[There is a] unique and important requirement placed up-front for any paper submitted; it must be replicable, with all data, software, formulas, and methods submitted with the paper. Without those elements, the paper will be rejected.

In the wake of Climategate I tried to interest the Committee on Publication Ethics, the umbrella body for scholarly journals, in the issue of replicability, but after two years of them procrastinating I took the hint and dropped the subject.

My guess at the time was that no learned journal wanted to be the first to demand data and code up front, frightened of scaring away potential authors. What, then, will be the impact of the Journal of the OAS? Will mainstream climatologists simply refuse to go near it? Will JOAS wither and die for lack of papers? Or will people start to look at those who submit to the traditional publishers and ask what it is they have to hide?

It's going to be fascinating.

84 comments

View Printer Friendly Version

Reader Comments (84)

Guys,
I'm sceptical about a 9 month run.

I wonder if there ever was a 9 month run where you load up the machine, hit the switch and it rumbles on for 9 months.

They must mean something else.. My late father-in-law got interested in fusion mechanics and devised a Fortran program for computing particle orbits in a Tocomak-like field which took two days to run on an Apple-II. He didn't have an UPS and used to really yell if someone took the masking tape off the switch to the socket the machine was plugged into. He also wasn't too pleased when after two days, the results indicated an idiotic input error. He used to say that the iterations were killing him. I would think a 9 month run would be similar only much much worse.

Sep 17, 2014 at 9:35 PM |

jferguson

As I understand it, climate models are just big state-transition functions run multiple times. You create a random weather state from some distribution, then you step it to the state an hour later, then an hour after that, and so on up to couple of hundred years. You record at least a smaller set of summary statistics as you go. Then you start it again on another initial state, and so on. After a few hundred runs you get an idea of the spread and statistics of the behaviour.

For the purposes of completely replicating a particular result in its entirety, you would indeed need to reproduce the whole run. But for the purposes of checking the physics, you could just run it for a single timestep, or for a few days worth of timesteps. Long runs should generally archive their state at intervals. For example, if you archive the state at the end of every simulated year, then if the system bombs out you only have to go back to the last save point. If there's some odd feature happens that you're not sure about, you can re-run just that one year, starting from the saved state at the beginning of the year. And it means you can spot-check the run by picking years/runs at random and just checking that each sample gets to the next save point correctly.

In practice, I'd think you'd want to operate a grandfather-father-son style of backup policy on it. Every state gets archived and backed up. But so you don't run out of disk space, after a while you start re-using the storage, only keeping every n'th state more permanently.

This sort of backup has the advantage that you can 'fast forward' the run for more detailed investigation using a less expensive computer. You do the 100 x 100-year runs on the big computer, then you scan through those to find the events you're interested in - El Nino years following snowy winters, say - and can then run just those few months on a smaller, slower computer with better instrumentation to log the specific statistics you're interested in - deep ocean temperatures/currents in the North Pacific, say. That would strike me as a useful facility to have. Particularly interesting sections could be sampled more frequently - at the end of every simulated day, say, instead of every month or year.

However, the replicability is not primarily to be able to do the calculation again, so much as to be able to determine later exactly what calculation was done. 'Harry' had that problem - even though he had the actual software, data, and compilers that were originally used, the software made the mistake of asking the user for manual inputs, which were not logged. Ian Harris had to guess what had originally been input. Being able to re-run the code is a guarantee that there is no ambiguity, which is the primary aim.

And it is also worth noting, simply archiving the code is not enough. You also have to archive the tests and checks you did to prove that the code is correct, in whatever sense you mean it, explaining how it works and why you think it's correct. Again, not simply to enable someone to repeat the checks, but to be totally unambiguous about what checks were done, and hence what gaps there might be left unchecked that flaws could sneak through. That helps the next round of scrutineers to tell where to concentrate their efforts.

And if you've done no checks, just say so. That way, everybody knows what they're dealing with.

Auditors should not simply be faced with a tangled mass of undocumented, inscrutable source code to decipher. There should ideally be a human-readable explanation describing how the reader could in principle go about checking it and assuring themselves that it works. Coders should bear in mind that the person most likely to be needing it is themselves, a few years down the line when somebody starts asking awkward questions. Consider it as a helpful letter written to your future self, who is in trouble and needs your advice. Will you help?

Sep 17, 2014 at 9:55 PM |

Nullius in Verba

Yep, NiV.

There should ideally be a human-readable explanation describing how the reader could in principle go about checking it and assuring themselves that it works.

Wherever possible through readable code and tests, not comments. That's why I liked those Google readability reviews. Code, tests and explanation of tests never start this way. Having a culture of readability improvement is to acknowledge that. I certainly agree that archiving should be organised so that you don't have to go the full nine yards (or nine months) to check something out. All the same, I think we need new definitions of openness for such gargantuan works of coding and computation.

Sep 17, 2014 at 10:13 PM |

Richard Drake

I am surprised by two things in Doug McNeall's post. First it is highly unlikely, in my view at least, that the publication of the code is in any way related to the replicability of the experiment. Whether you have a supercomputer or not running the same code on the same compiler would surely give the same results. No? Perhaps the software engineers on this thread could comment.

I would have thought that people asking to see the code would be looking at if for possible bugs and errors, or even some new statistical technique that always gives the required answer - that's been known before I believe.

The other point I picked up on was Doug's telling us that they've used version control in the Met Office for a "very long time". What? Version control is the very backbone of engineering work, Brunel used version control, in my days we were taught to write what each software module was doing then write the code. If subsequent changes were made they were explained with a note at the end of the software saying what the changes were and a knew version number generated - pretty much as had been done in engineering projects from the time of Brunel. Is Doug saying there was a time when the Met Office didn't do this most basic of engineering disciplines?

There is no excuse that can be made for not providing access to the original code used to support the conclusions of a scientific paper, none. That the climate science community has refused and in some cases continue to refuse, to let their critics see the code which helped them draw their conclusions is inexcusable and indefensible.

Let's hope we see more transparency for the future.

Sep 17, 2014 at 10:33 PM |

geronimo

"Wherever possible through readable code and tests, not comments."

Both. Humans find some forms of information presentation easier to understand that can't easily be expressed in text, let alone code, and human methods are often not very machine-efficient. Computers are fundamentally different to humans. There is an inherent risk in trying to do two not-quite-compatible things at once with the same bit of content - you wind up doing neither of them very well.

I know what you mean though. But I'd say use whatever methods work best in context. The rules are only there to make you think before you break them.

Sep 17, 2014 at 11:01 PM |

Nullius in Verba

So where does the "proprietary/commercially-sensitive information" stance fit in to all this? This could be a genuine dilemma in some situations, notwithstanding my opinions of whether a program is of real value or not. Didn't CRU/MetOffice use this justification at some point in the past?

Sep 17, 2014 at 11:14 PM |

michael hart

"Of course, It's only so useful that I give you all my code, if you don't have a supercomputer to run it on...."

Whilst I am quite sure Doug doesn't mean it, however it does have a hint of

Sep 18, 2014 at 12:09 AM |

Green Sand

“All those requirements on authors should be standard practice for any reputable university or research organisation...” --Mikky

That will be contemporaneous with porcine aerobatics.

“And are today's programmers of GCMs sufficiently meticulous...? :) Whatever our answer to that, there's no question that their programs are alive in the sense meant above.” --Richard Drake

Alive? They're simply teeming.

“How often is 20 year old work up for replication or re-run? I ask after resurrecting a Sun system stored in 94.” --jferguson

A heaping load of well-deserved kudos is headed your way.

“I do not agree. I have walked through GISS ModelE and was alarmed by what I saw.” --CharmingQuark

Why don't we see this sort of post-mortem detailed on a skeptic blog somewhere?

“The first thing that needs to be open is honest debate of these issues and their implications for 'evidence-based' policy making. Until then, yes, pass the salt.” --Richard Drake

Pass an entire salt mine. Please.

Sep 18, 2014 at 12:34 AM |

jorgekafkazar

CharmingQuark> "modern professional software would have well over 50%"

I doubt you'll find much agreement on that. With that many comments you'll find that the comments and the code quickly diverge during maintenance. Once they have diverged they are just noise and/or misdirection.

On the OAS, there is a paper by Anthony Watts that has been waiting to be published for years. Maybe it can be the first and flagship paper for the new journal.

Sep 18, 2014 at 12:48 AM |

Raff

jorgekafkazar:

“I do not agree. I have walked through GISS ModelE and was alarmed by what I saw.” --CharmingQuark
Why don't we see this sort of post-mortem detailed on a skeptic blog somewhere?

Exactly what I thought on reading the original comment.

Raff:

CharmingQuark> "modern professional software would have well over 50%"
I doubt you'll find much agreement on that. With that many comments you'll find that the comments and the code quickly diverge during maintenance. Once they have diverged they are just noise and/or misdirection.

Nullius makes some deeper points about the limits of text, and the need to break the rules to enable human understanding, both of which are really important to take on board, but you've nailed a key drawback about both text and diagrams that claim to describe code here.

Just as an anecdotal point I was being walked through some Ruby code in the last month as part of a new piece of work and after a while I said to the guy, who'd written most of the system we were looking at, "I really like how infrequent your comments are and how useful they are when you make them." He stopped in surprise and said he appreciated someone saying that. You don't get that kind of feedback very often as a programmer, that you may be on the right track, he said with feeling. I meant what I said too - it's rare to find the right balance struck in this area. If you manage to do so it's a great gift to the next person coming along. Open source is not always good in either direction - too much, out-of-date or not enough at mission-critical, hard-to-understand or simply smelly moments in the code (Kent Beck's famous term and the most important case for commenting). We're still learning about this. My emphasis is always to make the code more readable first, then comment if it's still not easy to understand. Even a "WTF" is worth inserting sometimes to let someone else know they're not alone in finding some section hard. The FIXME, OPTIMIZE, TODO standard is politer but can itself become too rigid, like everything. Write exactly the right amount of tests in all areas (ha - easy!) and break the rules, in search of maximum understanding, as NiV says.

Sep 18, 2014 at 2:20 AM |

Richard Drake

A physics-based equation, with only two drivers (both natural) as independent variables, explains measured average global temperatures since before 1900 with 95% correlation, calculates credible values back to 1610, and predicts through 2037. The current trend is down.

Search “AGWunveiled” for the drivers, method, equation, data sources, history (hind cast to 1610) and predictions (to 2037).

Search “consensusmistakes” to find out why thermalization makes CO2 change NOT a driver.

Sep 18, 2014 at 6:03 AM |

dan pangburn

dan pangburn
Do you mean this http://agwunveiled.blogspot.fr/ or have just read the wrong article?

Sep 18, 2014 at 9:49 AM |

SandyS

Raff:

" CharmingQuark> "modern professional software would have well over 50%"

No I am afraid that is done in the real world. As a professional modeller I can assure you there is much agreement in this although lapses due to time constraints, may occur. It is the divergence of the associated documents from the code that is the serious problem and therefore, commenting in the code is the best way of tracking its development.

Sep 18, 2014 at 10:08 AM |

CharmingQuark

Like another thread I think the problem here is that two areas are being convoluted.

If I write a program for science research then I don't strictly have to be exact to the letter about what software program I used to create a result. I would describe this in my method and hopefully do a good job. I may decide to be more exact describing what program language I used or compilers etc but this would be extra.
It someone wanted to replicate what I did all they would need is my data and methods. The emphasis would be on whether what I said was in the ballpark rather than being to 3 dp so to speak.

Now if what I produced in a paper was used for policy or for some sort of engineering verification then I need to be way more strict and incorporate basically what most people here have said.

And this is the trouble with climate science. It is being used as if it were engineering standard but yet it is only scientific standard. And I don't think the protagonists appreciate this in any shape or form.

Sep 18, 2014 at 10:30 AM |

Micky H Corbett

Micky: Exactly. The phrase that came to me overnight was "Is this paper policy-ready?"

Sep 18, 2014 at 11:22 AM |

Richard Drake

That's a good phrase Richard. To go further the question should be asked why ANY scientific papers be used as the basis of policy or action without first some verification.

For climate science the meme that 97% of scientists agree that the present climate change is caused by man would become "who cares? If a team of engineers doesn't think the work is up to scratch and applicable then you can think what you like"
Even if it were true it would be irrelevantly for practical purposes.

With regards to the journal, if the work may have policy consequences then definitely ask for a higher standard of data and method.

Sep 18, 2014 at 12:13 PM |

Micky H Corbett

This takes me back to Steve Mc's comment MHC. Those paleo papers where the authors were far less than open with code and data were obviously not policy-ready yet the hockey stick was the poster child of the IPCC in 2001. And, as Steve said, this was trivial to put right, and this has still not happened in the worst cases. How much the new Society can change this situation I don't know. But in the end the penny will drop with a new generation of policy makers.

Are reports of GCM results, including by the IPCC, policy-ready? In a word, no. What would make them policy-ready is a fascinating as well as an important question but it shouldn't, as Steve said, be used as a smokescreen for the far simpler and more blatant cases of non-transparency he's uncovered again and again since emailing Michael Mann in 2002.

Sep 18, 2014 at 1:10 PM |

Richard Drake

CharmingQuark> "No I am afraid that is done in the real world. "

Many things are done that are a matter of personal preference (or company coding standards), not a hard, universal, rules (and it depends upon what one considers a "comment" and what "noise"). The answer to "what is the right code/comment ratio" is always: "well, it depends...". Code can easily be over-commented.

I don't suppose anyone else is in the slightest bit interested in code/comment ratios so please just accept that the world is not black and white, right and wrong.

Sep 18, 2014 at 1:56 PM |

Raff

Raff: On the contrary, NiV and I are obviously interested :)

I totally agree that software development cultures vary greatly on this. May the best culture win is what I say. And I'll stick my neck out and say that this is totally and centrally on topic for this thread about Anthony's OAS. It's time we got into the nitty-gritty of what climate openness really means. No other open source effort would diminish the importance of test suites, code comments and other documentation. People in the 'real world' know that this stuff really matters. It does in climate publishing too - though as Steve and Nic have intimated there's a spectrum of size and complexity from paleo/TCR studies through to properly reporting the results of GCMs. One size does not fit all.

Sep 18, 2014 at 2:42 PM |

Richard Drake

"who apart from the coders of GCMs have this kind of runtime [9 months] to contend with"?

Feynman tells of the Manhattan Project that a team of [human] computers would produce answers to three problems in 9 months. [Cf. p.48 in this version.] He also talks about how they invented techniques to recover from errors without starting over, and perhaps the first example of multi-threading.

Sep 18, 2014 at 2:51 PM |

HaroldW

Great story Harold, of which I knew only a little. Similar is what Kristen Nygaard told us at the British Computer Society's Object-Oriented Programming & Systems Group around 1985, our first major meeting. As Wikipedia says:

Nygaard worked full-time at the Norwegian Defense Research Establishment from 1948 to 1960 - in computing and programming (1948–1954) and operational research (1952–1960).

In 1948 the Norwegians didn't have the money for a real computer so Nygaard began sitting at a desk acting as a CPU, alongside many others. The experience of actually being a CPU led him, he was convinced, to invent object-oriented programming, with Ole-Johan Dahl, with the release of Simula-67 in 1968. (Software deadlines, nine month gestation periods for simulation and naming mistakes again!) Alan Kay picked up those ideas in the formation of the Learning Research Group at Xerox PARC in Palo Alto from 1970 and gradually Smalltalk and other dynamically-typed object languages emerged.

In the pub with a Ruby development team in the Old Street area the day before the London Olympics began in 2012 someone suggested we went round and each say the most famous person we'd met. (It wasn't me that suggested this, promise!) The guy before me said he'd once met Richard Feynman on the train. That blew the rest of us away, quite rightly! Some of this magical history is worth recounting.

Sep 18, 2014 at 3:33 PM |

Richard Drake

Whether you have a supercomputer or not running the same code on the same compiler would surely give the same results. No?

No. Floating-point numbers and arithmetic are implemented differently on differently platforms. Packed decimal support is, IIRC, non-existent outside of S/360 and its descendants. Unless all numbers are represented -- and can be manipulated -- as character strings of arbitrary precision, results would not be guaranteed, even given identical input.

Sep 19, 2014 at 4:48 PM |

Akatsukami

I guess I've got to be the one to say it, because no one else will.

"Climate Change" goes out of fashion. The hype goes away. Journals like this one start demanding real scientific standards from contributors.

So some researchers, despite the shortage of funds in a post- "Saving the Planet" academia, start doing real research on the climate.

And it turns out it really is changing catastropically...

Sep 19, 2014 at 7:45 PM |

Uncle Gus

Great thread, guys, and I say that as someone you'd probably consider an "alarmist".

I also work in computer modeling (materials science). In my experience, in-house code in academia is often badly written, patched together, but usually reasonably correct. This makes sense, given the incentives: we pay professors to publish reproducible results, not to create high-quality, readable code. Plus, grad students in scientific fields typically don't have much CS training, and they're the ones writing the code.

"Replication" in science typically means that someone can take the Methods section of your paper and reproduce the general data backing up your conclusions. They may have to write their own code or build their own lab equipment, but.. that's on them. As long as the Methods section is thorough enough to explain what you did and how you did it, then it passes the standards of science.

So if an academic group develops some new code and publishes on it, you realize that their incentives are to get as many papers out of that code as possible. If they publish the code, it makes it much easier for someone to scoop them on those subsequent papers.. so why would they publish the code? And if they had to, why would they make it easy to use?

All this to say: if you want academic groups to publish their code, and if you want it to be readable and easy to use, you need to incentivize them to do so. And it has to be good incentives. Not just stick, but carrot, or they'll find ways around it.

Sep 19, 2014 at 9:03 PM |

Windchasers

I also like the conversation about "policy-ready" science. If science is to be used to support policy, we should endeavor that the work is not just replicable, but as open and transparent as possible. Which would mean full publication of all source code, scripts, inputs, data, etc.

The reason we didn't see that in climate science? Because it was a scientific field, and it operated just like other scientific fields. I.e., we'll give you the information you need to go out and reproduce the work yourself, but we're not going to give you our code or our data. That's not what "replication" means.

In this regard, the field has come a long way, and is much more open than it used to be. (And it's far more open than my own field is currently. "Give you our code? Pffffft. But here are some textbooks that explain the numerical methods we use. Have fun.")

But really, can we work towards this standard for all science that's used for policy: full transparency? Regardless of whether the policy in question is about economics, or environment, or engineering?

I long to see the day when "policy-ready" science has its own special stamp of approval, and politicians are afraid to cite any science that can't be readily checked by you or I. Let's raise the bar on the standard of evidence used in government.

Sep 19, 2014 at 9:12 PM |

Windchasers

"So if an academic group develops some new code and publishes on it, you realize that their incentives are to get as many papers out of that code as possible. If they publish the code, it makes it much easier for someone to scoop them on those subsequent papers.. so why would they publish the code?"

There could be several reasons.

In theory, the primary reason ought to be to advance scientific progress. Lots of people say that's what they're doing it for, and are often the sort to sneer at 'commercialism' outside academia. But just as a scientist and a human being it's a good thing in itself, and it's a big part of the justification for being given public money. What are you doing standing in its way?

Second, because you don't own it. The people paying for it do - and in the case of publicly funded science, that means everybody. You're going to take my money to build your software, and then not even let me see it? Go do that with your own money!

Third, because it's a powerful part of the scientific method. Scientific credibility is founded on surviving the challenge of your peers, so the more challenges you survive, the more credibility you have. (And conversely, the harder you make it to reproduce the less credibility and influence you have.) It helps catch errors faster, it encourages more people to check, if you make the checking easy for them. People build on it, and build your discovery into something bigger. Instead of the little-known inventor of an obscure lemma in some academic backwater, you are the founder and father of an entire scientific discipline with thousands working on it.

Fourth, because it builds dependence. The UNIX operating system did not become so ubiquitous and powerful because it was any good (at first), it became ubiquitous and powerful because it was given away free to universities, where the next generation of computer scientists and engineers were learning their trade. Getting a large user base for your software puts you in an influential position in the field. You get to decide where the field goes next by what capabilities you build in to the next generation of the software. People become reliant on you, and in turn will defend you if the continued provision of the software they depend upon is threatened.

Fifth, because you can get a slice of the fame, when people credit you for the use of your software in their work. Every researcher hopes that other people will cite their publications. Make it a condition of use that people cite you if they use your software. It's cheap for them, and nobody want to be accused of plagiarism, so they'll likely do it. And when your name and the name of your product is plastered over every result published in the field, you again become a powerful figure in the community, a big name, and will be invited to all the best conferences.

Sixth, because it develops a community spirit in which other people will share their code with you, and there are a lot more of them! You give one program to ten other people, and get ten programs back in exchange for it. Profit!

Seventh, because you can sell it. If the software is really that useful that other groups could generate lots of results out of it cheaper than by developing their own code, then sell it for a slice of their profits, save yourself the effort, and spend your time instead on developing something new. Paywall software does have its problems - it makes people less willing to replicate your work and so you lose credibility somewhat. But many people feel that the money makes up for it :-), and it does mean that people can replicate your work if they want to enough. In fact, the more people you can get to do so the more money you make, which encourages a healthy openness to criticism.

Eighth, because virtually everything you know and virtually every bit of software you use was written or developed by other people and given to you. (Or sold to you for far less than it cost to develop.) How dare you use their works for your own benefit and give nothing back? Did you invent calculus? Or did you get it from a book somebody else wrote, or from a teacher who had to work hard for their expertise? Your career is built on knowledge and techniques that other people gave you, although keeping the knowledge secret would make their skills rarer and more valuable. You win the race by being faster and stronger than everyone else, not by being better at crippling your competitors and holding them back.

Ninth, because criticism is a good teacher. If you keep your software private, neither anybody else nor you yourself will ever find out about your mistakes. But if you publish it, you will have the flaws and problems pointed out to you, from which you can learn and become a better coder. Long experience teaches lessons you'll not get in any school, and will keep you ahead of the game when the youngsters start snapping at your heels. But only if you learn from your mistakes, which means you've got to make some and get caught. The more you catch, the faster you can learn, the farther ahead you'll be.

And tenth, because if you really think other people scooping you is a real possibility, one that could easily be prevented by making it harder for them to perform the same calculations, then you can simply wait before publishing the first results until you think you've squeezed out all the benefit. There's no need to rush it. Or if you think they're already close on your heels and soon to publish, then publishing your methods yourself will make little difference.

Good scientists follow the scientific method for good reason. It works. It grants enormous and very practical advantages. It's not just a bunch of abstract holier-than-thou principles written by stuffy Victorian gentlemen in old books, like some sort of Marquis of Queensbury rules for the academic fight. It's sad that it's not taught formally in schools as it should be - you're supposed to catch it by a sort of osmosis, from your more experienced colleagues.

Sep 19, 2014 at 11:04 PM |

Nullius in Verba

A new society for blogscientists. Let Eli fix that for you.

Sep 21, 2014 at 11:47 PM |

Eli Rabett

Eli,

Are you saying only blog scientists believe in replicability? Do you?

Sep 22, 2014 at 6:08 PM |

Nullius in Verba

NiV, let me give responses. (And I don't necessarily disagree with you here.. I'm just picking apart stuff, trying to see the pros and cons, weighing the strength of the arguments).

#1) Yes, advancing science should be our goal. But scientists aren't paid just to "advance science", but to publish.
If I'm paid to publish, and my ability to publish is hurt by releasing my code, then there's no way I'll release my code. I'd be hurting myself.

#2) No, neither the funding agencies nor the citizens own the code. Legally, it's usually owned by either universities or the professor / author.
#3) True, but if I can build credibility without releasing code, that's even better.
#4) True, but if the metric that determines funding is publications, then getting moral support doesn't matter much.
#5) True, but what's even better than citations? Getting those big publications.
#6) Who needs other people's code?
#7) This seems contrary. If I give away my code, I can't sell it. I've already destroyed my own market.
#8) Really, a moral argument? See #1 again, which outweighs the appeal to altruism.
#9) You assume my code has significant flaws, and/or that we can't get them worked out privately.
#10) True, and the tactic of waiting to publish is sometimes used. But the truth is, often we don't know how far behind other groups we are. So we publish as fast as we can, but we also hold on to whatever advantages we have.

#1 is the most important point here. As long as I'm being paid primarily to publish original, repeatable, influential work, anything that interferes with that is right out.

If you want to change that, then you need to change the performance metrics that determine funding. There's no way around that.

Sep 23, 2014 at 1:17 AM |

Windchasers

"If you want to change that, then you need to change the performance metrics that determine funding. There's no way around that."

Agreed.

And I'd say that the new rule ought to be that it doesn't get funded if it's not replicable, because it's not science.
That would sort it.

Sep 24, 2014 at 6:19 AM |

Nullius in Verba

I assume you mean independent replication of results.

So, how would you enforce that? Would you only apply that to computationalists, or are you going to make sure that everyone's work is replicable before it can be published? That seems a pretty high bar for experimentalists.

If I tell you the methods I used, the work should be replicable, whether it's computational work or not. Though, like I said, that may require you to set up your own lab or write your own code. And if my work is not replicable, my reputation and my funding will already suffer, as a result. I don't know that you need to add penalties beyond what already exist; science is already self-correcting in that regard.

Since what you're suggesting seems to already be part of the funding process (non-replicable results tend to hurt one's chances for funding), I feel like I'm probably misunderstanding you. Can you explain?

Sep 24, 2014 at 6:54 AM |

Windchasers

jorgekafkazar - Re: model E - I recall bender at CA years ago offering commentary on the model E code and there is still a CA post linking to Dan Hughes blog which now dead ends:

http://climateaudit.org/2007/02/13/this-fix-in-the-giss-code/

Over the years there have been others that have looked into the internals and Isaac Held's blog springs to mind as decent place to look for more as well as the ModelE page at GISS:

http://www.gfdl.noaa.gov/blog/isaac-held/2011/02/17/1-introduction/

http://www.giss.nasa.gov/tools/modelE/

Windchaser - check out the EUs policy work on data transparency for publicly funded research:

http://ec.europa.eu/research/science-society/index.cfm?fuseaction=public.topic&id=1294&lang=1&cookies=disabled

Sep 24, 2014 at 8:27 AM |

not banned yet

"I assume you mean independent replication of results."

I mean reported in enough detail that the reader can tell exactly what was done, what was seen, what calculations were performed, what precautions and checks were made, what alternative explanations were considered. Providing data and code makes at least the calculation part of it absolutely unambiguous.

The idea is that you are not just reporting the conclusions, you are presenting the evidence as well. By publishing, you are in effect asking other scientists to check your working - that your conclusions really do follow from the evidence. The more people check it, the more credibility the result has, so you want to make it as easy as possible for people to do so.

Consider Feynman:

"But there is one feature I notice that is generally missing in cargo cult science. That is the idea that we all hope you have learned in studying science in school--we never say explicitly what this is, but just hope that you catch on by all the examples of scientific investigation. It is interesting, therefore, to bring it out now and speak of it explicitly. It's a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty--a kind of leaning over backwards. For example, if you're doing an experiment, you should report everything that you think might make it invalid--not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you've eliminated by some other experiment, and how they worked--to make sure the other fellow can tell they have been eliminated.
Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can--if you know anything at all wrong, or possibly wrong--to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it. There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition.
In summary, the idea is to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgement in one particular direction or another."

"If I tell you the methods I used, the work should be replicable, whether it's computational work or not. Though, like I said, that may require you to set up your own lab or write your own code."

In principle, yes, but do you know how hard it is to describe your methods precisely enough to do so? OK, so you "smoothed the data", but what filter weights did you use? What length? How did you handle missing values? How did you handle the end points? How did you calculate the error bars? Did you assume Gaussian errors? Independence? Homoscedasticity? Which version of the data set did you use? Which particular series? Why did you pick those particular series, and not any others? What adjustments were made to it? Did you centre, or de-trend it first? Was the sampling uniform? Did you delete any outliers or bad data? How did you decide on the filter bandwidth - was it based on what looked good or was there an objective criterion? How did you decide what filter to use? Was it before or after you had seen the data? Were there any bugs in the code you used? How do you know?

By the time you have given enough information for someone to know exactly what you mean by "smoothed the data" and write their own code to replicate it, you had might as well have just provided the code. This goes even more so for more complicated algorithms.

Consider, for a specific example of what I'm talking about, the case of HARRY_READ_ME.TXT. Have a look at the published peer-reviewed paper that describes the database (Mitchell and Jones 2005) and then have a look at Harry's attempts to replicate what was done, and tell me if you think the paper gives sufficient information/description.

http://www.anenglishmanscastle.com/HARRY_READ_ME.txt
http://onlinelibrary.wiley.com/doi/10.1002/joc.1181/abstract

20 pages is quite long for a paper, but even with that much space it is impossible I think to describe what happened in enough detail for someone to just "write their own code". Do you think anyone ever did? Or did everybody else just use the database without checking it?

Sep 24, 2014 at 6:52 PM |

Nullius in Verba

notbannedyet,
I definitely support open access science, particularly for scientific articles which inform public policy.

Nullius in verba,
I love 95% of what you said. I think it's almost completely spot-on... except for these parts:

"By the time you have given enough information for someone to know exactly what you mean by "smoothed the data" and write their own code to replicate it, you had might as well have just provided the code."
Not generally true. Writing code, particularly efficient code, is not so easy. I can tell you "I applied a forward Fourier transform using the FFTW library" in, well, ten words, but for you to work through the FFTW documentation, write your own code, and test it... if you have FFTW experience, that will probably take you at least hours, or if you're not an experienced coder, days.
So it goes for most of the code, even the "straightforward" stuff.

"Providing data and code makes at least the calculation part of it absolutely unambiguous."
Sorry, but I actually LOLed when I read this.
My experience with academic codes is that they're badly commented and hard to understand. Sometimes they're spaghetti code - basically impossible to understand if you didn't write it. In those cases, it's actually easier to write your own. (I know; I've rewritten such codes before).

I'll agree with you that often, methods are not fully disclosed. That's a problem, and should be remedied. The journals need to raise their standards there.
However, you should realize that providing code is no substitute for a thorough, well-written, well-described methodology. I can write up a bad description of my methodology, and if I also give you a badly-commented code, you'd be hardly better off than you were without it.

Let me give an example. A few months ago, I got an email from a graduate student I'd never met, asking if my graduate advisor or I had released any public versions of the code we used. I informed we didn't, but sent him some references which did a good job of explaining our methodology. A few weeks passed, and he sent me a copy of his newly-written code, asking me to help him find an error. The code was completely hopeless - no comments, no naming consistency, heck no subroutines, and nested loops going more than 6 deep.

No way I'm going to try to debug that. Nosirree.

It's likely he'll eventually find his error, and I wish him the best. However - as a reasonably experienced coder, I could rewrite up the code that he was trying to write in probably half as many lines, about 1000% more clearly, and in less time than it would take me to find the bug in his code.

Sep 25, 2014 at 12:46 AM |

Windchasers

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <blockquote cite=""> <code> <strike>