A call for reproducible research
Statistician Victoria Stodden makes a call for scientists' data and code to be shared.
Many people assume that scientists the world over freely exchange not only the results of their experiments but also the detailed data, statistical tools and computer instructions they employed to arrive at those results. This is the kind of information that other scientists need in order to replicate the studies. The truth is, open exchange of such information is not common, making verification of published findings all but impossible and creating a credibility crisis in computational science.
Federal agencies that fund scientific research are in a position to help fix this problem. They should require that all scientists whose studies they finance share the files that generated their published findings, the raw data and the computer instructions that carried out their analysis.
The ability to reproduce experiments is important not only for the advancement of pure science but also to address many science-based issues in the public sphere, from climate change to biotechnology.
I think we need a law in this country that says that scientific findings for which data and/or code are not available should not be allowed to inform public policy.
Reader Comments (21)
As much as I am opposed to more government regulation over our citizens; this is one law I would whole heartedly support. If such a law had been in force (and enforced) we probably would not have had such egregious examples of shoddy science as the "hockey stick" or Lonnie Thompson's refusal to share his data. I think if such a law were enacted there should be severe penalties for non-compliance. Say for example: a ban on receiving federal grants for a period of 5 years. Just my thoughts on this matter
One thing to highlight is that this isn't a general problem in the rest of the computing industry. I would say that over the 15+ years I have been involved nothing has been pushed more in the IT world than testing frameworks and code review, to improve reliability in the software you are using.
This makes it two bits of good news in one day! Canada's Minister of Natural Resources is "waging war" on green radicals. Although methinks that the folks at CRU (not to mention the inner circle of the IPCC) are probably none to happy to read either of these articles ... they'd probably much prefer to focus on the latest and greatest from UEA: a "proposal" for "radical constitutional reform" which does not augur well for voices of reason on your side of the pond.
U.K. or Canada: radical constitutional reform or war on green radicals?
[and if this comment actually makes it through, it will make it three bits of good news for me, today!]
A war against Green radicals, I'd volunteer for that! And the reason is right there in Read's daft proposal that we should go to the trouble of electing a government and its agenda then take 12 people off the streets (presumable eco-loons) who could turn over the democratic mandate on whim. It is a clear demonstration of how disconnected these fanatics are from real life.
Bish, your proposal would be: (a) impossible to enforce; (b) fundamentally anti-democratic. Much better the sanction of grant removal for data/code hoarders.
"Federal agencies that fund scientific research are in a position to help fix this problem. They should require that all scientists whose studies they finance share the files that generated their published findings, the raw data and the computer instructions that carried out their analysis."
Agreed
Staggering that this is not the norm in Climatescience
"I think we need a law in this country that says that scientific findings for which data and/or code are not available should not be allowed to inform public policy"
In principle, yes, but, more importantly, that it cannot inform public OPINION until it has been proven reproducible by an INDEPENDENT group/person using a different dataset. In other words, safeguards need to be in place to protect the IPU but the principle method(s) made available asap.
Very difficult to explain and achieve.
Something very similar already happens in the pharmaceutical industry. The only difference is the data gets submitted to the regulatory agency, rather than put in the public domain.
All raw data & program code is submitted as well as the statistical analysis. All computer systems used to collect, process or analyse study data have to managed using IT industry standard practices, systems validated and well documented. There are very strict rules about handling raw data and maintaining its integrity.
The rules and standards are already there.
In regards to the original post, the real issue becomes the integrity of the raw data. When opening up the data and code it leaves one main avenue for fraud; falsifying the raw data.
Proving data is legally non repudiable is not easy. If the data was recorded on paper, did each recording get signed by the person collecting the data? did they sign any paperwork saying that their signature was legally binding? Was the data collected under the control of a quality management system (iso 9001 etc)? etc, etc, etc. Recording raw data digitally creates even more questions.
I wonder if anyone has performed a site audit in Antarctica?
Has anyone investigated the quality systems used in collecting raw climate data? If no QMS system was used, audited etc then how can the data be trusted?
Surely the mechanisms for this already exist? It simply requires a rule that all proposals for publicly funded contracts must give an undertaking to produce all of the data and code. If the contractor does not supply data and code at the end of the contract they are not paid. The only exemptions could be for intellectual property rights, which could be negotiated on a case by case basis, but there surely could never be a case where climate "science" could claim such an exemption?
So time to dump the UAH satellite record then?
Bish:
Spot on. Such was my gut feeling the moment I read it. And after reading Neil McEvoy:
Disagree vehemently. The Bish's proposal reflects justifable outrage that the body politic has been so corrupted by the deceitfulness of climate science. These aren't normal times - and such a law would give a clear signal of that. I'm sure that where there's a will there'd be a way to make examples both of those guilty of non-disclosure and those lobbying for policies based on their findings, especially money-making scams on the backs of the poorest.
One problem is that everything in research is ultimately connected, in some way, to much else. Pick any research finding and then look at the prior researches upon which it is based, and the prior-prior researches upon which each of those researches is based, etc. Research is interwoven. Restricting a law to research that informs public policy is not practical, because most research indirectly helps to support some research finding that informs public policy. Hence, data/code for all research should be available, I think.
Stodden suggests openness in all government-funded research. That is a help, but I think still not enough, because substantial research is privately funded. My suggestion is to also have a Code of Conduct for research journals. Adherence to the Code, by any journal, would be voluntary. To encourage researchers to only publish in adhering journals, do the following: when someone submitted an application for research funds, to a government agency, citations in the application that were to research published in non-adhering journals would be downweighted—essentially, treated as unpublished.
I think we need a law in this country that says that scientific findings for which data and/or code are not available should not be allowed to inform public policy.
Exactly. And I think that the data that is made available should also not be just the 'value-added' (adjusted) data, it should be both the original unadjusted data AND the massaged data.
Richard Drake
I sympathise but don't you think that there is considerable danger in legislating on the back of "justifiable outrage"? Think Dangerous Dogs Act as the prime example and Labour's 'vetting and barring' legislation.
I agree that climate scientists in the UK — and that primarily means the clique at UEA — have got away with murder on the strength of not being required to publish their data and I certainly agree that the public are entitled to see the results of publicly-funded research but as I argued in the 'War on FOIA' thread you need to define pretty tightly what would fall within the ambit of legislation especially at what stage and in what form data are fit for human consumption.
I would like to see a genuinely objective oversight body which can examine research results before they are passed to government (and I can't stress 'genuinely' enough!) with an absolute bar on researchers advising government on their own research.
(Mind, that said, I would like to see much less lobbying of all kinds going on. It's not only the idiocies of renewable energy that are making tidy profits for the few at the expense of the many.)
I think Douglas has a good point with a financial incentive not to publish in journals which don't set the highest standards for providing data sufficient to replicate (or not) but the journals themselves also need monitoring. You can have the highest standards possible for how you publish, but if you have different standards for who you publish you could be just as badly off.
Hmm, does Victoria have an opinion on Beenstock & Reingewertz?
To have any chance of new laws people like Huhne would have to be removed first......10 days to go according to the CPS before Xmas!
Mike, I don't have long but I think this is the polar opposite of the dangerous dogs situation, where a very few innocents meet a grisly end, the taboids fume and heavy-handed legislation gives more power to bureaucrats for very little gain. In this case the impact of climate policies hits everyone, especially the poorest. It is outrageous that such policy is based on 'science' which refuses to surrender code and data to all, including critics. There's not nearly enough outrage which is why some well-aimed legislation will have vast symbolic value.
Indeed, the moment I read this I felt there are things we should be pushing legislatively:
1. repeal the Climate Change Act
2. cap the increases in energy bills due to carbon reduction policies in the nearer term than 1.
3. this proposal of Andrew's.
That means I think it's very important. The symbolic value of what a government commits to is of great importance.
There actually is a law like this in the US. It is called the Data Quality Act, and it attempts to make sure that all of the data that is used in analyses for public policy purposes by the US govt is open and available for review. Unfortunately, it is widely ignored by US govt agencies, especially the progressive ones.
Wiki actually has an accurate cite, although the description carries a bit of "spin".
Richard Drake
Yes, by all means repeal the Climate Change Act. My point was entirely about the introduction of new legislation on the back of what you rightly described as justifiable outrage. The usual result is quantity instead of quality and five years down the line we'll either have to do it again or parliament will refuse to because ministers might lose face.
The Climate Change Act was, in its own slightly different way, an example of what I meant.
"Rush to judgment", I think, is another way of putting it.
But better supervision and a demand for transparency? Yes, please.
Bish,
To give you an example of something very similar in concept, the US Constitution's Bill of Rights includes within the 6th amendment the right of the accused to confront his accuser and to cross-examine witnesses (including physical evidence). In the UK, think of the evidentiary rule against admission of hearsay.
People who will be affected by a change in policy should have the right to examine the evidence by which the state seeks to justify the change. It should not be hard to implement. If someone in the public demands to see the evidence, the state should have to make it available. If that involves scientific studies, make it all available. [Note, this includes more than publicly funded work.]
Why would it be any more difficult than FOIA compliance?
A more spot on alternative is to require the release of all data and/or code for scientific findings that are used as part of a scientific finding used for the formulation of public policy.
The EPA should be required to release this data for their CO2 and all other scientific findings that they use to justify their rule making as part of the regulator regime.