Uniform priors and the IPCC
Last week, I posted about a comment Nic Lewis had written at RealClimate. In that comment, Lewis had spent some time discussing a study by Aldrin et al, and noted that its findings were distorted by the use of a uniform (or "flat" prior). Although Gavin Schmidt did not respond directly to this point, one commenter pushed the question of the validity of the uniform prior approach a little further.
Graeme:
I thought James Annan had demonstrated that using a uniform prior was bad practise. That would tend to spread the tails of the distribution such that the mean is higher than the other measures of central tendency. So is it justified in this paper?
This elicited a response from a statistician called Steve Jewson (a glance at whose website suggests he is just the man you'd want to give you advice in this area):
Following on from the comments by Nic Lewis and Graeme,
Yes, using a flat prior for climate sensitivity doesn’t make sense at all. Subjective and objective Bayesians disagree on many things, but they would agree on that. The reasons why are repeated in most text books that discuss Bayesian statistics, and have been known for several decades. The impact of using a flat prior will be to shift the distribution to higher values, and increase the mean, median and mode. So quantitative results from any studies that use the flat prior should just be disregarded, and journals should stop publishing any results based on flat priors. Let’s hope the IPCC authors understand all that.
Nic (or anyone else)…would you be able to list all the studies that have used flat priors to estimate climate sensitivity, so that people know to avoid them?
RC regular Ray Ladbury then chimed in with this:
Steve Jewson,
The problem is that the studies that do not use a flat prior wind up biasing the result via the choice of prior. This is a real problem given that some of the actors in the debate are not “honest brokers”. It has seemed to me that at some level an Empirical Bayes approach might be the best one here–either that or simply use the likelihood and the statistics thereof.
To which Steve Jewson replied:
Ray,
I agree that no-one should be able to bias the results by their choice of prior: there needs to be a sensible convention for how people choose the prior, and everyone should follow it to put all studies on the same footing and to make them comparable.
And there is already a very good option for such a convention…it’s Jeffreys’ Prior (JP).
JP is not 100% accepted by everybody in statistics, and it doesn’t have perfect statistical properties (there is no framework that has perfect statistical properties anywhere in statistics) but it’s by far the most widely accepted option for a conventional prior, it has various nice properties, and basically it’s the only chance we have for resolving this issue (the alternative is that we spend the next 30 years bickering about priors instead of discussing the real issues). Wrt the nice properties, in particular the results are independent of the choice of coordinates (e.g. you can use climate sensitivity, or inverse climate sensitivity, and it makes no difference).
Using a flat prior is not the same as using Jeffreys’ prior, and the results are not independent of the choice of coordinates (e.g. a flat prior on climate sensitivity does not give the same results as a flat prior on inverse climate sensitivity).
Using likelihood alone isn’t a good idea because again the results are dependent on the parameterisation chosen…you could bias your results just by making a coordinate transformation. Plus you don’t get a probabilistic prediction.
When Nic Lewis referred to objective Bayesian statistics in post 66 above, I’d guess he meant the Jeffreys’ prior.
Steve
ps: I’m talking about the *second* version of JP, the 1946 version not the 1939 version, which resolves the famous issue that the 1939 version had related to the mean and variance of the normal distribution.
Nic Lewis was happy to concur and to provide a list of flat-prior studies.
Steve, Ray
First, when I refer to an objective Bayesian method with a noninformative prior, that means using what would be the original Jeffreys’ prior for inferring a joint posterior distribution for all parameters, appropriately modified if necessary to give as accurate inference (marginal posteriors) for individual parameters as possible. In general, that would mean using Bernardo and Berger “reference priors”, one targeted at each parameter of interest. In the case of independent scale and location parameters, doing so would equate to the second version of the Jeffreys’ prior that Steve refers to. In practice, when estimating S and Kv, marginal parameter inference may be little different between using the original Jeffreys’ prior and targeted reference priors.
Secondly, here is a list of climate sensitivity studies that used a uniform prior for main results when for estimating climate sensitivity on its own, or when estimating climate sensitivity S jointly with effective ocean vertical diffusivity Kv (or any other parameter like those two in which observations are strongly nonlinear) used uniform priors for S and/or Kv.
Forest et al (2002)
Knutti et at (2002)
Frame et al (2005)
Forest et al (2006)
Forster and Gregory (2006) – results as presented in IPCC AR4 WG1 report (the study itself used 1/S prior, which is the Jeffreys’ prior in this case, where S is the only parameter being estimated)
Hegerl et al (2006)
Forest et al (2008)
Sanso, Forest and Zantedeschi (2008)
Libardoni and Forest (2011) [unform for Kv, expert for S]
Olson et al (2012)
Aldrin et al (2012)This includes a large majority of the Bayesian climate studies that I could find.
Some of these papers also used other priors for climate sensitivity as alternatives, typically either informative “expert” priors, priors uniform in the climate feedback parameter (1/S) or in one case a uniform in TCR prior. Some also used as alternative nonuniform priors for Kv or other parameters being estimated.
Steve Jewson again:
Sorry to go on about it, but this prior thing this is an important issue. So here are my 7 reasons for why climate scientists should *never* use uniform priors for climate sensitivity, and why the IPCC report shouldn’t cite studies that use them.
It pains me a little to be so critical, especially as I know some of authors listed in Nic Lewis’s post, but better to say this now, and give the IPCC authors some opportunity to think about it, than after the IPCC report is published.
1) *The results from uniform priors are arbitrary and hence non-scientific*
If the authors that Nic Lewis lists above had chosen different coordinate systems, they would have got different results. For instance, if they had used 1/S, or log S, as their coordinates, instead of S, the climate sensitivity distributions would change. Scientific results should not depend on the choice of coordinate system.
2) *If you use a uniform prior for S, someone might accuse you of choosing the prior to give high rates of climate change*
It just so happens that using S gives higher values for climate sensitivity than using 1/S or log S.
3) *The results may well be nonsense mathematically*
When you apply a statistical method to a complex model, you’d want to first check that the method gives sensible results on simple models. But flat priors often given nonsense when applied to simple models. A good example is if you try and fit a normal distribution to 10 data values using a flat prior for the variance…the final variance estimate you get is higher than anything that any of the standard methods will give you, and is really just nonsense: it’s extremely biased, and the resulting predictions of the normal are much too wide. If flat priors fail on such a simple example, we can’t trust them on more complex examples.
4) *You risk criticism from more or less the entire statistics community*
The problems with flat priors have been well understood by statisticians for decades. I don’t think there is a single statistician in the world who would argue that flat priors are a good way to represent lack of knowledge, or who would say that they should be used as a convention (except for location parameters…but climate sensitivity isn’t a location parameter).
5) *You risk criticism from scientists in many other disciplines too*
In many other scientific disciplines these issues are well understood, and in many disciplines it would be impossible to publish a paper using a flat prior. (Even worse, pensioners from the UK and mathematicians from the insurance industry may criticize you too :)).
6) *If your paper is cited in the IPCC report, IPCC may end up losing credibility*
These are much worse problems than getting the date of melting glaciers wrong. Uniform priors are a fundamentally unjustifiable methodology that gives invalid quantitative results. If these papers are cited in the IPCC, the risk is that critics will (quite rightly) heap criticism on the IPCC for relying on such stuff, and the credibility of IPCC and climate science will suffer as a result.
7) *There is a perfectly good alternative, that solves all these problems*
Harold Jeffreys grappled with the problem of uniform priors in the 1930s, came up with the Jeffreys’ prior (well, I guess he didn’t call it that), and wrote a book about it. It fixes all the above problems: it gives results which are coordinate independent and so not arbitrary in that sense, it gives sensible results that agree with other methods when applied to simple models, and it’s used in statistics and many other fields.
In Nic Lewis’s email (number 89 above), Nic describes a further refinement of the Jeffreys’ Prior, known as reference priors. Whether the 1946 version of Jeffreys’ Prior, or a reference prior, is the better choice, is a good topic for debate (although it’s a pretty technical question). But that debate does muddy the waters of this current discussion a little: the main point is that both of them are vastly preferable to uniform priors (and they are very similar anyway). If reference priors are too confusing, just use Jeffreys’ 1946 Prior. If you want to use the fanciest statistical technology, use reference priors.
ps: if you go to your local statistics department, 50% of the statisticians will agree with what I’ve written above. The other 50% will agree that uniform priors are rubbish, but will say that JP is rubbish too, and that you should give up trying to use any kind of noninformative prior. This second 50% are the subjective Bayesians, who say that probability is just a measure of personal beliefs. They will tell you to make up your own prior according to your prior beliefs. To my mind this is a non-starter in climate research, and maybe in science in general, since it removes all objectivity. That’s another debate that climate scientists need to get ready to be having over the next few years.
Steve
I wonder how many of the flat prior studies will make it to the final draft of AR5? All of them?
Reader Comments (87)
@HaroldW (Jan 25, 2013 at 4:37 PM)
One can see the effect of priors in figure 4 of Foster et al. (Science 2002) [free registration required], which comments
That should be Forest et al, not Foster et al. (Vol. 295, 4 January 2002, p113-117)
I fully agree with Paul Mathews 11:08 am
….it's immediately obvious to me that a uniform prior is daft for a continuous unbounded variable. For example the prior of Hegerl et al 2006 is that the pdf is const at 0.1 for all sensitivities up to 10 then suddenly drops to zero. And they don't even seem to have tested the dependence on the cutoff point.
I was not aware of the Jeffreys' Prior. A brief visit to Wikipedia was disappointing for all the examples of Jeffrey's Priors resulted in uniform distributions!
I had an Aha! moment with the six sided-die example. Naturally prior for a six-sided die should be uniform for the integers 1,2,3,4,5,6 and zero for all other numbers. EXCEPT, what if the die in addition to be possibly loaded was also mis-manufactured? When I was a kid, I possessed a factory-second die that had a seven (Six superimposed with one) and two Fives and I think two Threes. Clearly with this knowledge, the prior distribution for any die should have non-zero probabilities for 0 and seven, but just as clearly the prior probability for seven ought to be much smaller than six.
Getting back to the Hegeri paper. Assuming absolutely no knowledge of prior research, it is absurd to give a uniform prior distribution where 2 is as equally probably as 10, but 11 as impossible. However, given the amount of research that has been published, it is equally absurd to propose a prior distribution where a sensitivity of 2 is equally probable as 10.
Uniform priors need to pass the "snif test' is P(x=a) more, less, or equally likely as P(x=b)? (for various a,b pairs.)
Maybe I'm one of those subjectivists… being a geophysicist, that wouldn't be surprising.
oneillpt (5:47 PM) -
Ugh. You're quite correct. I try quite hard to keep Forest & Forster separate, and then I go and mess it up that way. :(
Thanks for the correction.
DaveA
"Steve Jewson is presently at the "Shock" stage of the climate debate awareness sequence."
I think Steve is well aware of what has been going on. The real problem is that apart from him and me there seems to be hardly anyone involved in climate science who has a good understanding of Jeffreys' prior, or of applying such a noninformative prior so as to achieve objective inference. Steve's co-author and Jeffreys' prior advocate Dan Rowlands has unfortunately moved out of academia.
James Annan has highlighted problems with uniform priors, but if I understand him correctly he is at heart a subjectivist Bayesian - someone who thinks of probability as being a personal degree of belief.
Nic:
My assumption from a distance.
It doesn't seem too far-fetched by now to read this as a fight for the basic integrity of the academy across the board. Kudos to Jonathan and Paul for their enlightening contributions and, even more, for stopping the rot in person, Doug and Nic providing external call to arms. An outstanding BH thread.
For those interested in learning about statistics, UC Berkeley are running a free online course - Intro to Stats:
https://www.edx.org/courses/BerkeleyX/Stat2.1x/2013_Spring/about
Be quick though - the course starts next Wednesday (30th Jan)...
A uniform prior may not be great, but at least it's better than having a lumpy posterior...
When I heard that David MacKay was to act as an advisor to the DECC, I, naively, assumed he`d be lending his weight to the proceedings in a manner similar to the quoted RC exchange. A victim of almost Nursian credulity, it was not to be.
Simon Abingdon, I did read much of the VS thread when it first came out, and the incomprehension of most of his opponents is a sight to behold. That said I tend to think that rather too much is made of some of these arguments by people on both sides of the debate.
simon abingdon,
I do not know who VS is. The 'Visiting Statistician' tag comes from Josh #14, at
http://bishophill.squarespace.com/blog/2010/3/26/josh-14.html?currentPage=2#comments
VS kept reiterating the same ponts, as I recall. He/she was good at ridiculing Eli, Tamino, Dhogaza and all the attack-dogs but never seemed to reach a conclusion as to the best way to analyse the series. And then he asked a question at some stats forum and got a brush-off. The incapability of anyone wedded to the consensus to see that maybe they should consider another models was, however, very enlightening to me. It was perhaps my break-through moment - these team guys do not know what they are talking about, was how it seemed to me.
@DaleC (Jan 26 12:33AM)
Thanks for the clarification.
Please ignore--just a test to see if I can post.
Hi Andrew,
I cannot make "the" case for uniform priors but I can make a case.
In order to begin, one needs to have some idea as to the intended purpose: "Why is the data being presented?"
In the case of things IPCC such is never particularly clear to me. I suspect the purposes are numerous.
If the intention is to express purely evidential information from the current experiment or study, that which can be inferred from the data at hand, one could consider presenting likelihood functions as opposed to pdfs. One could then present a desired prior and the resultant pdf.
A likelihood function and its corresponding pdf with a uniform prior, although quite distinct concepts look the same. If all of the components of the IPCC figure were likelihood functions, and it was labelled as such, that would have made a lot of sense. Unfortunately not all of them were if I recall correctly.
Were they all likelihood functions one would be comparing eggs with eggs, evidence with evidence. If any two were evidentially independent, e.g. derived from different epochs, one could combine them by multiplying them together, which is something that one shouldn't do with pdfs.
If the studies had all provided likelihood functions, plus some particular choice of prior and a resultant pdf, what should be presented in the IPCC report?
The prior could be objective, subjective, or the pdf from some previous study that the authors deemed to be evidentially independent, perhaps a pdf from a study also in the report. In the last case some data would be presented more than once in different forms.
I do not know if authors receive any guidance as to how to present data with a view to its inclustion in the report, if they did, presenting just the likelihood or including the likelihood would be quite sensible.
Uniformity between the studies would have been achieved if the prior had been 1/S throughout or any other prior throughout. That is true but might be difficult to agree upon. The uniform prior could have had the appeal of blandness, inoffensiveness.
If they all shared a common prior and that were the uniform prior we would at least have consistency and utility. Given that all the priors were uniform substituting a different choice of prior would be trivial. If they all had different unspecified priors, changing the priors would not be possible.
I come back to the question of purpose. To the degree that studies are being compared, the use of a common prior, and that being the uniform prior is quite sensible. If the purpose is to attach some meaning to the studies in combination then a common prior would not be unreasonable. But what should it be?
Being as the report is a collective effort, I wonder if it would have been possible to agree on anything other than the uniform prior. That is no good reason to do so but it could have been the only practical choice.
Alex
Good (but also bad) to be reminded of VS's heroic efforts to educate climate scientists about some statistical basics. Good in that it was great to see someone at last tackling the nonsense, bad because I am reminded of the utter incomprehension with which his efforts were greeted. It was as if their minds were to simple to grasp that it would be possible for OLS to be a fundamentally inadequate approach. Perhaps it is too much to expect from a discipline in which one the leading lights doesn't know how to operate Excel.
Unfortunately there is every chance this effort to enlighten the discipline will suffer the same fate.
Simon, I intend to read the VS take down. Do you happen to know who VS is?
Hector Pascal@: "In 1948, "continental drift" was a beautiful theory for which there was no known physical basis." Indeed, but the question is what is a suitable reaction by a scientist to a phenomenon for which there is some decent evidence, but no useful hypothesis for the cause. Jeffreys essentially said that because he could see no cause, he ruled out the phenomenon. That's rather like someone arguing that because he can think of no other cause for the world getting warmer 1850 - 1998 then the cause 'must be' manmade CO2. Essentially it attributes too much importance to a failure of human imagination. A better response, in my view, would have been "I shan't accept the theory of Continental Drift until I can see a plausible explanation of how it could come about". A better yet might have been "The observations supporting the theory are rather impressive but there's still an absence of explanation: by what mechanism could this have happened?" One consequence of Jeffreys' attitude is clear in hindsight: he happened to be wrong on an issue on which armies of schoolboys knew better.
Alexander Harvey
"If they all shared a common prior and that were the uniform prior we would at least have consistency and utility."
The AR4 Figure 9.20 climate sensitivity (S) PDFs were all stated to use a uniform-in-S prior; the IPCC even changed the Forster & Gregory 06 results from using the objectively correct prior – almost exactly1/S^2 (uniform-in-1/S) – to a uniform-in-S prior basis. Unfortunately they didn't realise that Gregory 02 also used a virtually uniform-in-1/S implicit prior, so they didn't change its PDF. Hence the IPCC had to issue a correction, when I pointed the error out.
I'm afraid that using the same prior for all studies doesn't provide utility or consistency in their results - that is a misconception. The form of the Jeffreys' prior (or other noninformative prior) depends on both the relationship of the observed variable(s) to the parameter(s) and the nature of the observational errors and other uncertainties, which determine the form of the likelihood function. So a uniform-in-S prior might be noninformative for one study, whilst a uniform-in-1/S prior might be noninformative for another study. For one-parameter instrumental-period studies, the latter is generally the case.
@Bob (Jan 26 3:40PM)
No Bob, I'm afraid I don't and I don't know anybody who does.
Hi Nic,
I was aware, as I indicated, that they hadn't been consistent.
The question as to whether they could have been is also doubtful. For some studies they could have reduced the curves to likelihoods. In effect they treated them as likelihoods for they renormalised them over the interval [0,10] which doesn't make much sense if they were considered to be pdfs.
There seems to have been a number of weird things going on with their treatment of the data for which arguments about a choice of prior may just be a proxy.
As I recall, some of the distributions for S were improper, were not integrable, hence the truncation, I am not sure that a reference prior would fix that, but they could improve it. If the first is true the confidence intervals would have been a bit dubious.
Studies that had the same statistical model could have been compared on a like to like basis, some I think didn't, and perhaps shouldn't have appeared in the same figure.
I think the greater problem was that it ended in a fudge. More people may think that it is that the presentation is deceptive.
Let us say that they attempt to apply some alternative prior, to do so they would surely need to reduce the data to likelihood functions prior to application, the later being that which they tried and failed to do.
I think that two arguments are getting conflated, one that the IPCC tend to make a consensual botch of things in the view of many outside observers, and the other that there is no consistent way of combining or comparing the data they had. Some of the studies might have benefitted from a different choice of prior, but would that have really increased the collective consistency or utility? Quite possibly the opposite, some could have used reference priors, some expert priors, some posteriors from previous studies.
Did they do well? Not really; but I do doubt that altering priors would have improved things.
Finally, given that the problem is simple, we are only trying to infer a distribution for a parameter that expresses the ratio of temperature to flux or vice versa, how many different reference priors would you envisage? If you can see that there be many then that makes a difference.
How many different reference priors do you think there should be?
Mostly one would expect to use 1/S, for others it may not be possible to change the prior. how would that be any more consistent or useful?
Alex
Simon, just wondering if this is VS http://www.vanderbilt.edu/econ/faculty/cv/ViscusiCV.pdf. He writes like an American, do you agree?
Hi Alexander,
"Finally, given that the problem is simple, we are only trying to infer a distribution for a parameter that expresses the ratio of temperature to flux or vice versa, how many different reference priors would you envisage?"
Where climate sensitivity (S) is the only parameter being estimated, the reference prior will in all cases be the Jeffreys' prior, and is unique. If a direct estimation method is used (such as ordinary regression, or taking a ratio of the estimated changes in temperature and in {flux net of forcing}), the reference/Jeffreys' prior will be very close to 1/S^2 in form, for an instrumental-period study. That is because fractional uncertainty in changes in {flux net of forcing} is far larger than fractional uncertainty in changes in temperature.
If however an indirect estimation method is used, involving comparing observations with values simulated by a forced climate model at varying parameter settings (see Appendix 9.B of AR4 WG1), the Jeffreys' prior is likely to be different. The form of the Jeffreys' prior depends on both the relationship of the observed variable(s) to the parameter(s) and the nature of the observational errors and other uncertainties, which determine the form of the likelihood function.
dearieme - and the Darwin/Wallace theory of natural selection just eased itself into acceptance, didn't it. It didn't need a "bulldog" to help it on its way. And how long did it take for a potential mechanism - Mendel's genetics - to be dredged up from obscure literature? Without checking, I think it took until the 1920s before any mechanism for natural selection was widely agreed upon. Plate tectonics is following the same timescales, isn't it?
Bob and simon, the VS saga is recorded in Bart Verheggen's blog here here and here.
And here.
Jan 27, 2013 at 12:53 AM | Alex Heyworth
-------------------------------------------------------------
What an awesome collection of exchanges ... where is VS today ? LOL, Scott Mandia and I have something in common, 'Stats 1 for science' ... do I have to dress up in superman lycra now as well ? Fortunately, I recognised that with 'Stats 1 for science' I wasn't going to cut the ice and left it there.
Alex Heyworth, yes, most interesting exchange. I was just wondering who VS is. http://www.vanderbilt.edu/econ/faculty/cv/ViscusiCV.pdf
Hoi Polloi, your are right. The mystery continues for now.
I followed to first of Alex's links and it cost me an afternoon. VS had the patience of a saint; I selfishly hope he returns to the subject.
Bob--
Doubtful that your Vanderbilt candidate is VS--he has too much engagement with EPA and air pollution whereas VS was pretty clear about not knowing a lot about climate science per se.
But I too would like to know who he was,just so I could learn at the feet of the master.
I remember reading the 'VS' thread. Definitely VS was Dutch (or Dutch-speaking). My impression at the time was that he or she could be anybody with some reasonable knowledge of time-series analysis. My thought was that he/she might be a PhD student or postdoctoral researcher, working in some area related to statistics. It is also worth noting that the whole issue of the mean temperature series as being a 'random walk' was not a new one in that thread. Douglas Keenan has, I believe, been suggesting this for some time. Going back some time, Steve McIntyre has discussing long-term persistence, a related concept, on a number of posts. As I understand it, it does look as if the temperature time-series shows long-term persistence. This does make interpretation of trends very tricky. But on its own, I don't think it provides a killer argument against AGW.
VS is indeed Dutch and I met him once. He is an econometrician. He is working on a paper that is analysing temperature time series. I hope this will come out this year.
Incidentally, the Beenstock article which was a central part of the VS thread has now been published, see
http://www.earth-syst-dynam-discuss.net/3/561/2012/esdd-3-561-2012.html.
There is also an interesting paper by Terence Mills in the journal of Cosmology here.
http://journalofcosmology.com/ClimateChange112.html
There are a couple of irritating typos leading to missing equations, but the gist is there. He just tries to break the actual times series of global temperature down into trend and cycle using standard modern methods. What he finds is that temperature is best represented as a stochastic cycle plus a random walk, i.e. no deterministic trend.
Thanks for the confirmation Marcel. I certainly hope VS will return to the discussion somewhere sometime, as I found his contributions always highly educational and entertaining.
I was reading the abstract for Using multiple observationally-based constraints to estimate climate sensitivity (Annan and Hargreaves, 2005)
This stood out
This makes a uniform prior of 0-18.5 deg C seem even more unrealistic, surely?
It would be nice to know what difference the use of priors make.
In what way would CAGW theory be different? How would the rationale be weakened, and to what extent?