- Bishop Hill blog - On consistency

Thursday

Mar272014

Bishop Hill

On consistency

Mar 27, 2014

Climate: MetOffice

Climate: Models

Climate: Oceans

Climate: Statistics

In the wake of the Press Gazette "debate", I was watching an exchange of views on Twitter between BH reader Foxgoose and Andrea Sella, a University College London chemist who moves in scientific establishment and official skeptic circles.

Sella was explaining how persuasive he found the observational record of climate:

Think like a scientist! Temperature is only a proxy. Energy balance is real issue & C19 physics is alive and well.

Like Warren Buffett you mustn’t be affected by shorter term fluctuations.

As I said, don’t just look at surface temps. Look at sea level and global ice mass too. All part of same.

I gently inquired of Sella whether he there had been any statistically significant changes in these records, to which he responded:

1.56 ± 0.25 mm over a century isn’t significant?

I then pointed him Doug Keenan's one-page article about why you can't decide about significance just by looking at a graph.

This brought about the following rebuke from Richard Betts.

Your 'statistical significance' argument is silly: http://wmbriggs.com/blog/?p=8061

The link is to a Matt Briggs article on why statistical significance is a flawed concept. As readers here know, Briggs is a Bayesian and sees flaws in the frequentist approaches to statistics. This is fine, but frequentists will of course shoot back that there are problems that are as bad or worse with the Bayesian approach. I'm not enough of a statistician to come down on one side or other of this particular fence and I can cope with people making arguments under either paradigm.

But people do need to be consistent, a point Briggs makes in the article Richard B linked to.

If we seek to understand this physics, it’s not likely that statistics will play much of role. Thus, climate modelers have the right instinct by thinking thermodynamically. But this goes both directions. If we have a working physical model (by “working” I mean “that which makes skillful predictions”) there is no reason in the world to point to “statistical significance” to claim temperatures in this period are greater than temperatures in that period.

Why abandon the physical model and switch to statistics to claim significance when we know that any fool can find a model which is “significant”, even models which “prove” temperatures have declined? This is nonsensical as it is suspicious. Skeptics see this shift of proof and rightly speculate that the physics aren’t as solid as claimed.

This brings me neatly back to where we started. Andrea Sella points to the observational records and implies that I should draw conclusions from them. My response is that if you think we can learn something from them, show me how the change is significant. If you think statistical significance is a flawed concept and that we should be examining the congruence of observations with the output of physical models then do not wave temperature graphs in front of my nose. Tell the public loudly and clearly that we can learn nothing from observational records on their own and that we need a physical model. Then tell us why your physical model is sound despite estimating a value for aerosol forcing that is at material variance with observations, and despite it producing estimates of warming that vastly exceed observations.

And you absolutely must not, as the Met Office has done, tell the GCSA that warming is "significant" without any statistical foundation. Do not, as the Met Office has done, tell Parliament that warming has been "significant" without any statistical foundation either. Doing things like this will leave you having to beat an embarrassing retreat to a position of "we don't use statistical models", directly contradicting your earlier pronouncements. It will also leave you in the tricky position of having to explain whether you think the IPCC is "silly" for using statistical significance, or indeed whether your own employer is "silly" too.

92 comments

View Printer Friendly Version

Reader Comments (92)

Ah yes, found another post on this from Michael Tobis

I have been doing some thinking about how to tell real experts from fake experts. A single sound bite (on a topic on which the listener is inexpert) reveals nothing. You have to look at the whole record of the speaker and the speaker's close allies to determine who is an expert and who is playing one on TV. The "tell" is coherence. The core problem is that people who pay insufficient attention and people who don't themselves understand coherence have influence, especially in a democracy.

Mar 27, 2014 at 7:14 PM |

Eli Rabett

I second Richard Drake in his appreciation of Doug Keenan's post. I thought it admirably clear, and it taught me a few things I hadn't realised.

Mar 27, 2014 at 7:17 PM |

Michael Larkin

"The core problem is that people who pay insufficient attention and people who don't themselves understand coherence have influence, especially in a democracy."

If only we could get rid of THOSE people and sort it all out between experts, coz that always turns out so well.

Mar 27, 2014 at 7:35 PM |

rhoda

Maurice says the rabbit's only apparent contribution to this thread is in terms of the incoherent consistency that's been accompanying him for too long a time.

Alternatively, he's posted to the wrong blog. Rabbit years fly quickly.

Mar 27, 2014 at 8:09 PM |

omnologos

Mar 27, 2014 at 2:06 PM | Douglas J. Keenan

To enable useful study of climate systems by computer simulations somebody must first describe THE CLIMATE SYSTEM in physical models, that can then be translated to computer models. Judging the output of the current set of computer models I do not believe anyone has an appropriate set of such physical models.

Computer simulations of climate systems at this moment have as much to do with reality as the Game of Thrones, or the Song of Ice and Fire. Very absorbing to some, of course, especially when done on a supercomputer.

Mar 27, 2014 at 9:44 PM |

Albert Stienstra

//
"Independent replication, by different people, using their own code and methods, on different data (if you can get it) proves something."
//

"...proves something..." ??

Such as?

Mar 27, 2014 at 9:46 PM |

not banned yet

For the educational benefit of Tobis and Rabbett an expert is someone who has a reputation of being correct more often than not. Pessimistic and ultimately inaccurate speculation doesn't qualify. Multiple contradictory speculations just demonstrate collective idiocy. Far more useful to just admit you don't know - then we won't get everyone preparing for a drought instead of the coming flood etc.

But then seeing a rising trend, pretending to know what causes the rise and then predicting it will continue to rise based on some spurious correlation isn't clever at all - especially when it doesn't and most especially when your linear, two-variable simplification of a multivariate, chaotic, nonlinear system can't include for simple cooling events. Seeing a rising trend and predicting it will peak and/or plateau would have been more clever but then only the skeptics did that and mainly because we've smelt this type of BS many times before. But of course even a humble coin-toss has more utility than the collective predictive wisdom of climate science.

Mar 27, 2014 at 10:02 PM |

JamesG

It's worth reading Doug McNeall's post "A brief observation on statistical significance" from a few weeks ago:

This has been on my mind for a while.
I think the observation is best summed up as:
If somebody asks if something is statistically significant, they probably don’t know what it means.
I don’t mean to offend anyone, and I can think of plenty of counter examples*, but this is borne out of long observation of conversations among both scientists and non-scientists. Statistically significant is sometimes used as a proxy for true, and is sometimes muddled with significant or meaningful or large. In climate, it also gets confused with caused by human activity.
Even those that have done lots of statistics can forget that it only tells how likely you are to see something, given something you think probably isn’t true.
It’s one of those horrible, slippery concepts that won’t stay in the brain for any length of time, so you** have to go over it again, and then again, every time to make sure that yes, that’s what it really means, and yes, that’s how it fits in with your problem.
No wonder people don’t know what it means.
*This is just a personal observation, but there is data out there. I’m sure people will provide counter examples in the comments.
** And by you, I really mean I.

Mar 27, 2014 at 10:02 PM |

Richard Betts

Eli,

...how highly rated redundancy is in science. You don’t really believe anything seriously before it has come from several independent sources.

I totally agree - pity the IPCC completely failed to seek ANY independent corroboration of Mann's manipulations and scientific abhorrences before splashing his spliced and tortured hockey stick all over their Third Assessment Report. We'd have been in a different world without that disgraced and disgraceful icon of disinformation, wouldn't we?

Mar 27, 2014 at 10:04 PM |

flaxdoctor

Richard Betts,

It's also worth emphasising that there's nothing magical about the 5% probability level. It's only a convention that a 1 in 20 chance is accepted as the limit for evidence that datsets are from different populations.

Mar 27, 2014 at 10:14 PM |

flaxdoctor

Richard Betts - that's good stuff from Doug McNeall, it rings true. But the misunderstandings that are rife about stat- significance do not mean it is to be dismissed. They do mean that great care is required with the word ' significant'. More care than displayed in those quotes from your Met Office in the main post, and in sundry shallow ventures such as the Doran and Zimmerman survey. The one that did so much to give us that 97% statistic that stinks so much.

Mar 27, 2014 at 10:26 PM |

John Shade

@ Richard Drake, 6:55 PM

Glad we agree!

@ Richard Betts, 10:02 PM

Statistically significant is sometimes used as a proxy for true, and is sometimes muddled with significant or meaningful or large. In climate, it also gets confused with caused by human activity.

If we choose our statistical model to represent natural variation—as is usual—and we find some observations that are statistically significant, then how would you interpret that? Put another way, our assumption is that the model represents natural variation and we have observations that lie outside the expected range of the model; so what led to those observations?

I know of only two ways to interpret those observations: either something very unusual happened just by chance or there was some non-natural variation—which would presumably be human activity.

Mar 27, 2014 at 10:47 PM |

Douglas J. Keenan

But what of the converse. We look at recent experience and find nothing which has not happened before during the times when natural was the only variation there was. Is there a statistical method to examine whether what is claimed by the hypothesis can be true in the absence of unprecedented happenings, or to establish an expected frequency of such happenings if the hypothesis were true?

Nothing much is happening, is it? If it were we would not be in the realm of making terminological disputations the central ground of the argument.

Mar 27, 2014 at 10:53 PM |

rhoda

when you measure the temp with a thermometer, the model you make is that you think you're sticking the sharp end in something which has a homogenuous temperature allround for the time the mercury settles (is a dynamical system).
Which is,all of it, already..false

Hur Hur Hur

Sorry..back to work..

Mar 28, 2014 at 1:07 AM |

ptw

rhoda- one of the most well guarded non-secrets of climate change science is that there cannot be possibly much happening right now.

There is no scientific hint, let alone evidence, of anything outside natural variability for one or two decades to come. This is consensus science of the kind 100% of people who understand even the most basic aspects, understand.

So in a sense all this talk about the "observational record" is, from the point of view of AGW/CAGW, bunk. It's too early to tell.

Mar 28, 2014 at 1:08 AM |

omnologos

Think like a scientist! Temperature is only a proxy.

Colour me stupid, but I always thought that temperature was a measurement of heat. Now, it is a proxy? A proxy of what? Heat? How else is heat measured?

1.56 ± 0.25 mm over a century isn’t significant?

Presumably, this is the sea level he is talking about (though 1.56 what? Metres, or millimetres, or something else?); is he seriously suggesting that sea levels can be measured to that degree of accuracy (± 0.25 mm), particularly over a century? How can he be so sure of the accuracy of the measurement 100 years ago? Perhaps he has satellite records…

Rhoda is right; nothing much is happening. Yet, we seem to be getting bogged down in a morass of statistics; why not use Occam’s Razor; the simplest solution is most likely to be the correct solution? If it has happened in the past for reasons unknown but surmised, what is unusual about it happening now? This gives us the opportunity to observe the event, and to attempt to establish the factors that might have some effect. Alas, with what little data we do have, we have leapt to a massive conclusion (“We are all about to D-I-I-E!”) based on very little evidence (a few centuries of very limited measurements of any kind). Perhaps we should spend more time considering quite why the portents of doom are so readily accepted by those who should show more caution (a.k.a., "scepticism").

Mar 28, 2014 at 2:53 AM |

Radical Rodent

Temperature is, if you will a measure of average thermal energy. You can define it from Boltzman, you can define it from the second law and you can define it from the ideal gas law. You can also argue about what thermal energy is but remember that Bolzmann and Ehrenfest committed suicide.

Mar 28, 2014 at 3:21 AM |

Eli Rabett

Sometimes I have to smile when I remember a small but very important piece of advice given to me decades ago by a much older and wiser colleague: 'pooling ignorance is singularly unhelpful and unenlightening'. This was at about the time when ' population expert' Erlich, now a fairly newly-made FRS, and his mad buddies were doing their first 'end of the world is nigh' gig, shouting about imminent imminent and dangerous cooling of the earth, overpopulation, running out of resources, etc.
How often can the ignorant over-educated spreaders of ignorance work at widening our pool of ignorance without losing their credibility?
The general level of credulity seems to rise in tandem with the general level of education!

Mar 28, 2014 at 4:12 AM |

Alexander K

I really just don't get it. It's a bald faced scam. All this splitting of hairs merely supports the orthodoxy, the voices of authority. Legitimizes them.

http://s6.postimg.org/jb6qe15rl/Marcott_2013_Eye_Candy.jpg

Mar 28, 2014 at 5:09 AM |

NikFromNYC

Thank you, Eli, though I am not too sure what you are saying; is “average thermal energy” another term for “heat”? How is it measured, if not with a thermometer?

I am fully with you, NikFromNYC; it is a scam, and we should be trying to spread that message rather than splitting hairs as to how the statistics measure up (and, as any fule kno, statistics conclusively show that 85% of all statistics are made up on the spur of the moment). The original scare was about the increasing surface air temperatures; now that it is evident that the surface air temperatures are not increasing, they seek the heat elsewhere – then find it in places never before looked at (the deep of the deep blue sea – though it is black as night down there), with few instruments of unproven record, and declare a rise of 2/100ths of a °C (or K, if you prefer), as if it is, like, actually measurable. You couldn’t make it up – oh… wait… they did!

Mar 28, 2014 at 7:35 AM |

Radical Rodent

"I was watching an exchange of views on Twitter between BH reader Foxgoose and Andrea Sella, a University College London chemist who moves in scientific establishment and official skeptic circles."

By 'official skeptic circles' I think you mean amateur groups who spend their time writing articles about how magic isn't real and how bigfoot footprint was probably faked? Well if they can't arbitrate on the latest scientific research, who can? ;-)

Mar 28, 2014 at 8:31 AM |

Will Nitschke

Maurizio, if it is supposed to be hotter now than for *insert number here* centuries, we will see that in record temps. Unless nighttime lows are higher giving a higher average without a visible effect on the daytime. Where are the records? Not the kind of per day per location records Americans are accustomed to seeing, but absolute records? If they cannot be expected, what is all the fuss about? The case of the CAGW crowd is not merely 'something is going to happen' but 'something IS happening'.

I'd be surprised if you could get the warmists who come here to agree that nothing much is happening.

Mar 28, 2014 at 9:04 AM |

rhoda

@eli rabett:
You state, "The "tell" is coherence. The core problem is that people who pay insufficient attention and people who don't themselves understand coherence have influence, especially in a democracy."
But people who have been wrong- disastrously- their entire careers, like Paul Ehrlich for instance, are considered 'coherent' and certainly have influence.

Mar 28, 2014 at 9:16 AM |

hunter

Independent replication, by different people, using their own code and methods, on different data (if you can get it) proves something.

I agree with Eli here - as a software engineer with almost 40-years experience, I once commented that the demand that all reserachers must release their code was misguided - what they should do is document the algorythms used in detail, and let others decide if they are the appropriate algorithms, and implement them themselves if they want to verify the results.

I got a very negative response from others when I made this comment.

Simply re-running someone else's program doesn't really tell you much, and it is so easy to read someone else's code and assume it does what the author intended without spotting the bugs.

Mar 28, 2014 at 10:04 AM |

steveta_uk

Climate may in some fundamental sense be deterministic. If you know which are the relevant input (exogenous) variables, can measure them with sufficient accuracy and solve the relevant (partial differential) equations with precision, then you may have an explicit formula for measures of climate.

However that seems so far off at present, that statistical methods will continue to be required. I should add that classical probability, as used in statistics, may usefully be thought of as a theory about lack of precise information. If one knows the exact 'state of nature' (in the sample space), then the value of a random variable in classical probability is (fully) determined. But we do not know this state of nature and so probabilities come into play.

Mar 28, 2014 at 12:15 PM |

basicstats

steveta

what nonsense you write there

It might be a reviewer is only skeptical about 10lines of codes of the author , so you expect him to rewrite and redo the
whole effort , possibly years of work of the author, in order for him to find out?

transparency is what eliminates problems

Mar 28, 2014 at 12:33 PM |

ptw

Excellent and educational discussion - thanks for this post, Bish.

The take home message for me is that a lot of "climate science" replicates the hockey stick - grafting different (and sometimes contradictory) methodologies together without acknowledging it.

Thanks also to those who have put time and effort into comments to educate the statistically challenged among us.

Mar 28, 2014 at 2:33 PM |

johanna

ptw - thanks for demonstrating my point for me.

Any software professional knows the value of a code review. You don't just review "only skeptical about 10lines of codes" - you review every single line, and several people are doing the same.

As we know from the Harry-Readme files, many of the people in scientific research do there own coding, and very rarely are they qualified or compentant to do so.

Mar 28, 2014 at 2:44 PM |

steveta_uk

rhoda - the case for AGW/CAWG is a case for future climatic changes, not past and not present. If our scientist friends were less reticent they would say this openly, instead of confining it to IPCC predictions for the 2080s.

Nothing, nothing, nothing could possibly be happening right now that is outside of natural variability. If it were happening, all models would be uselessly wrong and all climate scientists a bunch of ignoramuses. As per SREX:

Is the Climate Becoming More Extreme? [...] None of the above instruments has yet been developed sufficiently as to allow us to confidently answer the question posed here.

[...] Projected changes in climate extremes under different emissions scenarios generally do not strongly diverge in the coming two to three decades, but these signals are relatively small compared to natural climate variability over this time frame. Even the sign of projected changes in some climate extremes over this time frame

Everybody looking for non-natural extremes right now is uninformed, a liar and/or worse.

Mar 28, 2014 at 2:52 PM |

omnologos

Doug and Bish: I think you have performed a valuable service by forcing the powers that be to admit that their detection and attribution of warming relies on GCM's, rather than an analysis of the historical data that is unwarranted in a chaos system like our climate. Now the issue should be the reliability of the GCMs, not statistical models. Why? Science usually makes progress through the use of physical models/hypotheses, because it is hard to draw useful conclusions about anything by purely statistical means. Continuing to nitpick about statistical models - when we have been informed they aren't the real issue - could be counter-productive.

Imagine we look at a time series for an object falling through the air (on a windy day, which will add some chaos to the data). From physics and mathematics, we realize that a quadratic model for a time series height vs time makes better sense than a linear one and that models needs to be modified in some situations to include a term for air-resistance proportional to the square of the first derivative of height vs time. Could we ever analyze this relatively simple problem from a purely statistical perspective? In most scientific situations I am familiar with, progress is made by creating and testing hypotheses (physical models), not by purely statistical analysis of data. One can usually find multiple models that can be fit to any particular data set, and we usually prefer the simplest when it fits reasonably well. (A linear AR1 model for global warming doesn't meet this requirement.) We then rule out other simple alternative models by experiment, but we never prove a particular theory is correct. If you are aware of reasons why this generality is wrong, I'd love to read a post on the subject.

Lorenz published an prophetic paper in 1991 that accurately predicted the situation we find today, entitled "Chaos, Spontaneous Climatic Variations and Detection of the Greenhouse Effect". The key section, Part 4 on Greenhouse Warming, is only TWO PAGES long. It illustrates the difference between statistical models and physical models and PREDICTED that statistical models would be unable to detect significant warming in 2000; a prediction that is still true today (since it hasn't warmed since). Then Lorenz defines the conditions under which detection and attribution with physical models might be scientifically legitimate. No one's opinion on the proper methods for detecting forced change in a chaotic system should carry more weight than Lorenz and it was published before the pressure to provide useful information to policymakers degraded scientific standards.

"This somewhat unorthodox procedure [relying on GCMs] would be quite UNACCEPTABLE if the new null hypothesis [climate model output without anthropogenic forcing] had been formulated after the face, that is, if the observed climatic trend had directly or indirectly affected the statement of the hypothesis. This would be the case, for example, if the models had been TUNED to fit the observed course of the climate. Provided, however, that the observed trend has in NO WAY entered the construction or operation of the models, the procedure would appear to be sound."

http://eaps4.mit.edu/research/Lorenz/Chaos_spontaneous_greenhouse_1991.pdf

Have models met these requirements? Ask yourself this question: Would any government-funded climate model in a highly competitive scientific environment have survived several decade of development if it hadn't been tuned to reproduce the 20th-century warming? How can they all make similar predictions of current and future anthropogenic warming when they make such different regional predictions for the future. It's now clear that almost all models over-emphasized aerosols to produce a hiatus in warming around 1960, especially the high ECS models. In hindsight, that hiatus looks a lot like today's hiatus and the 1920-1940 warming looks much like 1975-1998 warming. The models appear to have been tuned mostly to match the latter warming and interpret the former as unforced variability.

Climate models contain dozens of parameters that describe sub-grid processes like precipitation, cloud formation, and heat flux between cells. Ensembles of simplified models have show that these parameters interact in surprising ways and that systematic tuning of one parameter at a time has lead to a local optimum, not a global optimum. Comparing simplified ensembles with current climate failed to uncover an optimum set of parameters. The uncertainty in these parameters is systematically ignored by the IPCC. When parameter uncertainty is properly accounted for, the range of projections for future climate that is compatible with our understanding of climate physics probably will be so wide as to be meaningless for policy development, leaving the field to the low sensitivities obtained from energy balance models.

Mar 28, 2014 at 2:53 PM |

Frank

steveta

you keep spouting nonsense for a 40y experienced sw something

Nobody can ever examine ALL code in a software system of any size..Don't tell
me you only believe in all or nothing, lol.

I can well imagine a Steve McIntyre or 1000 other interested people, who would like
to see the scripts how spaghetti graphs were constructed that led to the alarmish conclusions.

You do not need to go through all for that, but you need all to be available.

Only bureau rats in 70s sw environments would insist that THIS can only be viewed by Harry, THAT only by John etc.

Anyway ALL of modern sw development (OO, agile, linux, ..) goes against the grain of what you drivel

Mar 28, 2014 at 3:32 PM |

ptw

@Douglas J. Keenan

If we choose our statistical model to represent natural variation—as is usual—and we find some observations that are statistically significant, then how would you interpret that? Put another way, our assumption is that the model represents natural variation and we have observations that lie outside the expected range of the model; so what led to those observations?
I know of only two ways to interpret those observations: either something very unusual happened just by chance or there was some non-natural variation—which would presumably be human activity.

There's a third option - that it was wrong to assume that the statistical model represents natural variation.

We cannot tell what is natural variation and what is non-natural change just by looking at the observations, because both things are mixed in together. We don't have observations of a global climate with no human influence (or indeed a global climate in which human influence is the only factor). Therefore we have to apply some physical understanding in order to estimate which is which. This understanding is what is missing from purely statistical approaches.

@Frank

GCMs are not tuned to match the observed patterns of change over the 20th Century - it would be too computationally expensive. A GCM takes months to simulate a century of climate, even on a supercomputer.

It's not too surprising that the models differ in the future more than in the past, because the changes projected for the future are larger and hence this accentuates any small differences in the past.

The IPCC does not systematically ignore parameter uncertainty, in fact quite the opposite. There's several large studies which explore uncertainties in the parameters, eg. climate prediction.net. And yes indeed, this does result in large uncertainties in future projections, but that does not make these things useless to policy. Contrary to what seems to be a common view around here, policy does not require confident predictions of the future - it just requires an assessment of risks.

Just because you can't say for certain whether something will or will not happen, this does not necessarily mean it can be ignored until you are certain.

As I've said on other threads, I find Nic Lewis's projections of future warming rates to be too over-confident and overly certain. Having said that, I do still prefer Nic's approach to Doug Keenan's purely statistical one - at least Nic is looking at physical processes and using these to make an estimate of future change (even though I disagree with his estimate).

Mar 28, 2014 at 8:43 PM |

Richard Betts

@ Richard Betts, 8:43 PM

I am against using “purely statistical approaches”. That has been my primary point for years. It is the IPCC, and many climate scientists, who use purely statistical approaches. I have been arguing against that. Since you indicated that you have read some of my work, I am puzzled that you would claim otherwise. I am glad, however, to see that you are getting much closer to agreeing with my position.

There is no third option, such as you suggest. If someone is going to claim that there is some statistically-recognizable human influence on climate—as the IPCC and many climate scientists do—then they must base that claim on a statistical model of natural variation. If they do not have a statistical model, then they cannot draw inferences.

Mar 28, 2014 at 9:14 PM |

Douglas J. Keenan

Richard: Thanks for the reply. I do believe models have been tuned to match the historical record, but not as a direct result of the tuning process itself. Let's suppose we have two tunable parameters, A and B, and a choice of two sets observations, P and Q (say TOA radiation and precipitation), with which to tune these parameters. If we optimize A first and B second with respect to observation P, we might get model with a TCR of 1; but if we optimize B first and A second, the TCR could be 2. If we optimize vs observation Q, we get two more models with different TCRs. In this situation, where the data doesn't provide a clear direction, the tendency for modeler will be to gradually work their way (intentionally or unintentional) towards a model that matches: the historical record, the early consensus that ECS was between 2 and 4, and the crucial output from other groups. I agree with you that the groups don't have the ability to test every possible combination of parameters for the best fit to the historical record, but they will tend to prefer tuning strategies that result in models that match there expectations and stay away from models that will produce trouble. This provides a sensible explanation for why the CMIP 3 models with higher ECS also had higher sensitivity to aerosols, while those with lower ECS had lower sensitivity to aerosols. This is strong evidence that models haven't been developed completely independently from the historical record, thereby invalidating their use for D&A according to Lorenz.

As I understand the work done by Stainforth and the others in climateprediction.net, they have experimented with ensembles with at least 8 variable parameters and then tried to identify the best parameter sets by comparing the output of these ensembles to 8 different observables. Not only were they unable to identify any optimum parameter sets, they were unable to narrow the range for any one parameter by showing consistently inferior performance within part of its tested range. The range of warming for the full ensemble (I don't know if this technically qualify as ECS or TCR) ranged from less than 2 degC to greater than 11 degC. The IPCC never mentioned that "parameter uncertainty" could be this large. The parameters interacted in unanticipated ways and the authors expressed skepticism about arriving at a global optimum for any model by tuning parameters one at a time. The Stainforth models used a slab ocean to make the computations practical, so they didn't probe how changing the parameters for heat flux in the oceans would effect their results. In this multidimensional wilderness with multiple local optima and no unambiguous way to correctly tune a model, a model will evolve under selective pressure from expectations rather than data (IMO).

Here is why I believe that the IPCC is hiding the problem of parameter uncertainty. AR4 WG1 Section 10.1 says this about parameter uncertainty (but they don't use this term):

"Many of the figures in Chapter 10 are based on the mean and spread of the multi-model ensemble of comprehensive AOGCMs ... Since the ensemble is strictly an ‘ensemble of opportunity’, without sampling protocol, the spread of models does not necessarily span the full possible range of uncertainty, and a STATISTICAL INTERPRETATION OF THE MODEL SPREAD IS THEREFORE PROBLEMATIC. However, attempts are made to quantify uncertainty throughout the chapter based on various other lines of evidence, including perturbed physics ensembles specifically designed to study uncertainty within one model framework, and Bayesian methods using observational constraints."

Everywhere else, the authors simply ignore the fact that statistical interpretations are problematic, and simply pretend that the range of model output (which includes initialization uncertainty) provides a valid confidence interval. Numerous statements in the SPM, particularly confidence in human attribution and projections of future warming depend on a statistical interpretation the range of model output. No caveats are mentioned.

Mar 29, 2014 at 10:42 AM |

Frank

Frank, very useful contribution putting technical flesh on what at my level is only malformed suspicion of other folks' excessive certainty.

Mar 29, 2014 at 11:14 AM |

rhoda

Hmm, seems I'm always late to these discussions. Oh well, while I'm here I might as well have a rant ;-)

There are so many... odd things in this thread. Firstly, the discussion about what statistical significance is. Tests of statistical significance are simply tools. Briggs' criticism of significance is more to do with his strange engagement with Bayesian vs. Frequentism, a debate which can be likened to two carpenters arguing over whether the hammer or the saw is a better tool. It is a pointless debate, as both tools have their uses, and to say one is better than the other is entirely meaningless.

On the next point. Many people create a false dichotomy of comparing "statistical" methods with "physics" methods. There is no distinction here. Physics necessarily requires statistics at its heart. People seem to think that deterministic methods are somehow "physics" and statistical methods come from some other alien world. In fact, most deterministic physical laws - take the ideal gas law as an example - are merely restating the *statistical* behaviour of the random motion of atoms in a gas. So the idea that somehow the physics is distinct from the statistics is very peculiar indeed.

Back to significance. Significance in itself is meaningless without some hypothesis that we are testing. And note there are many hypotheses that we may choose to test. For example, I might ask the question has the globe warmed, irrespective of what has caused that warming. In this case, I could look at the temperature data and note a warming. But even if no warming had happened, the observations may show a warming due to accuracy of measurement devices, sampling limitations, etc. etc. It is important to know how likely random chance alone could have caused the observations - and if it is possible random chance alone could cause it, we cannot claim there is strong evidence of warming. In this way, I would argue Doug McNeall's definition is quite poor:

Even those that have done lots of statistics can forget that it only tells how likely you are to see something, given something you think probably isn’t true.

This isn't really right. It doesn't matter if you think something probably isn't true or not. The last 7 words of Doug's are really a distraction and potentially confusing. It is how likely we are going to see the observations from random chance alone, whether the hypothesis under test is right or not.

The key is that the statistical test must be informed from underlying physical relationships. This is not hard to do. For time series analysis, you must account for distributions and autocorrelation functions (or power spectral density, as I tend to prefer to think of signals in frequency space than the time domain). These are what we derive from first principles.

As an example, if I am building a radio receiver (such as that in your mobile phone or wi-fi router), I know the limit of measurement is governed by thermal motion of electrons in the receiver. I know from first principle physics what the power in the fluctuations will be, the distribution and the power spectral density. From this I can derive a statistical model of the sensitivity of my receiever, which in turn can tell me the limits in terms of bandwidth, range, error rate etc. of my system. All from statistics, since the noise behaviour cannot be described deterministically.

The same applies to climate. In the simple example above (has it warmed?) we can determine the limits of the measuring equipment, the sampling etc, and ask whether it has warmed or not. (For the record, I think the evidence is strong that it has warmed). A second question might be whether the warming is beyond natural variability of climate. In order to do this latter assessment, we need a new error model since natural variability of climate is higher than our ability to measure it. We need to know all the things I mentioned for the radio receiver - distributions, power spectral density, etc. etc., and these need to be derived from an understanding of the physics.

The problem with climate is that we don't have an understanding of the physics that allows us to do this. While climate scientists focus on the perceived consequences of GHGs, very little effort has gone into doing the ground work of understanding the basic statistical properties of natural variability of climate and what drives it. The GCMs are a joke in this regard as they do not capture the statistical properties of climate.

The reason we don't have such arguments about radio receivers is that the groundwork was done, by scientists who understood the symbiotic relationship between statistics and physics, and who did the work necessary to understand these complex systems. We have these arguments about climate because we are still very much in the dark about what natural variability really is, and what are the principle underlying causes.

Mar 29, 2014 at 6:03 PM |

Spence_UK

Spence: An excellent reply. (I'm always late to these discussions too, but thought takes time.) When you apply statistics to your radio receiver, you are starting with a physical model and using statistics to help you understand the meaning of the signals you observe. When you lack physical understanding or a good hypothesis, time series can still be analyzed by various statistical models (AR1, ARMA, random walk etc.) without any understanding the source of the noise in the signal you are trying to detect*. Under Doug's prompting, the Met Office's semi-admission that they can't assess the "statistical significance" of 20th century warming by purely statistical methods is a clear demonstration of the limitations of purely statistical models and the need for reliable physical models. The problem is that physical models with tunable parameters can be intentionally or unintentionally fit to the historical record - which contains unknown proportions of signal and noise. This is why I strongly recommend the Lorenz paper mentioned above: It illustrates the dilemma facing climate science today: purely statistical models aren't practical and the amount of parameter uncertainty in the output of climate models is unknown. Our policymakers desperately need projections with usefully narrow confidence intervals and modelers desire to produce such projections conflicts with a candid discussion the real problems interpreting model output statistically. Systematic observations of climate change and climate forcings may be providing an answer, but a TCR and an ECS don't have the same impact on policymakers as the high-resolution projections from climate models.

* Statisticians often create artificial data with specific types of signal and noise in them and study how to resolve them without added information. Scientists may have some understanding of the physical sources of noise in their data and they can use that information to select a statistical model that is relevant to their problem. In the case of temperature, it's clear (to me at least) that a random walk statistical model is totally inappropriate. Even if the earth experience a run-away greenhouse effect and has apparent been a "snowball earth" in the past, physics will not let the earth randomly walk indefinitely far from today's climate. Negative Planck feedback wins eventually.

Mar 29, 2014 at 9:39 PM |

Frank

Frank, although we are largely in agreement on most points, I think there is still an important distinction in our approaches.

I still do not see a distinction between a "statistical" approach and a "physical" approach. In my radio receiver example, the thermal motion of the electrons giving rise to the noise is completely unpredictable; it cannot be directly modelled or predicted. The only way to represent the physical characteristics of these motions is with statistics. So I do not see me starting with a physical model and adding statistics; almost the other way around, the statistical model is at the heart of my physical model, and my physical model does not even exist without statistics.

The idea of statistics being at the heart of physics is an important one (and indeed the basis of modern interpretations of thermodynamics, through e.g. statistical thermophysics, or quantum mechanics). And we can see this elsewhere. For example, the modern calculation of ECS and TCR which you refer to is only meaningful on the assumption that internal climate dynamics are Markovian. That is, that natural climate dynamics have an AR1-like structure. So even in these seemingly deterministic quantities, we find they are underpinned by statistics, not only that, but statistics that may be inappropriate for the system under analysis.

Mar 30, 2014 at 11:53 PM |

Spence_UK

Spence_UK: My understanding (which is probably limited) of the difference between statistical models and physical models (to which statistical analysis is applied) was crystallized by this Lorenz paper, which I mentioned above:

http://eaps4.mit.edu/research/Lorenz/Chaos_spontaneous_greenhouse_1991.pdf

The paper is relatively short, easy to understand and probably better than any response I can write. Lorenz doesn't use the terms "physical models" and "statistical models", the terms used in the Keenan/Met Office debate about the arbitrary choices of a statistical model (linear AR1) for detecting significant warming in the temperature record and their actual reliance on AOGCMs (physical models). In the first part of his paper, Lorenz illuminates the impracticality drawing detecting greenhouse warming ONLY from the short historical record by comparing this record to the unforced (internal) variability demonstrated by chaotic systems (that aren't being externally forced). Using the historical record alone, detection of involves a statistical comparison of the period before and during forcing and Keenan points out that detection of statistically significant warming depends on the type of noise present (ie the statistical model used; linear AR1 by the IPCC). Then Lorenz discusses applying "theory" and "models" to the detection of warming problem, which is what the IPCC/Met Office does using physical models (aka AOGCMs) to detect and attribute warming to GHG's.

If you wish to read the paper and continue the discussion, I will be glad to reply. I seriously doubt anything I could write at this point would be as illuminating.

Mar 31, 2014 at 5:53 AM |

Frank

Frank, Lorenz's perspective is also closely aligned with mine. The paper is very good and insightful considering when it was written. He notest that chaotic systems can exhibit persistence across multiple scales, and that it can be incredibly difficult to distinguish between external forcing and highly persistent internal variability. He also dislikes looking at trends and prefers looking at shifts in mean (something I agree with).

In order to distinguish natural variability and external forcing, you need to identify invariants of the chaotic attractor - something which we see little effort being expended by climate scientists. Without characterising these invariants, we have absolutely no idea what natural variability might look like.

Such characterisation will almost certainly be founded on statistics, since it is unpredictable and can only be described using statistics, but must be statistics rooted in the physical properties of the climate system.

Mar 31, 2014 at 2:37 PM |

Spence_UK

"Such characterisation will almost certainly be founded on statistics, since it is unpredictable and can only be described using statistics, but must be statistics rooted in the physical properties of the climate system."

I have this slight suspicion that statisticians see problems as framed in statistics in the way that everything looks like a nail. In the absence of a physical model which works, what we really have is a physical framework and a set of observations. What we need to do (as it seems to an Oxfordshire housewife) is to use the observations to establish the limits if any of the natural variation then create a physical model which can reproduce that. We are nowhere near that position now, so a statistical approach can't work. No good physical model, nowhere near enough observations to validate it with. This just is not the time to go the statistical route in terms of sorting the signal from the noise. The issue may be a wonderful opportunity for statisticians to debate, but it is not yet useful for anyone else to do anything but look on in wonder. We need another century of close observation. We need to be measuring the forcings rather more cleverly than we seem to be now.

Mar 31, 2014 at 3:34 PM |

rhoda

Spence: I like this paper because it represents an effort by THE expert on the subject to describe the detection and attribution problem at the time of the FAR - before the climate science community was under pressure to make definitive statements to support legislation restricting emissions. In 1991, Lorenz imagined what the situation would have been like in 2000 if strong warming continued for a second decade (as it did) after about a dozen decades which showed modest temperature change on the decadal time scale. He said: "Certainly no observations have told us decadal-mean temperatures are nearly constant under constant influences." (top of page 452). Well, the hiatus has proven that temperatures can remain flat for more than a decade despite increased forcing, proving that Lorenz was right to ascribe no meaning to the two decades of strong warming at the end of the century. Without understand of long-term persistence and unforced variability in our climate system, applying simple statistical models (such as a linear AR1 fit to the historical record to detect "significant" warming) was inappropriate to Lorenz almost a quarter century ago. Yet the IPCC found evidence of a detectable human influence on climate five years after Lorenz paper when the SAR was written!

However, Lorenz does provide qualified support for the strategy of assessing unforced variability on a decade to century time scale using AOGCM's (physical models) and thereby gaining information about the likelihood that the observed late 20th century warming can be explained (all or partially) by unforced natural variability. If the historical temperature record were a millennium long, there would be enough information about unforced decadal variability to assess the significance of the warming in the past few decades. AOGCMs have provided us with roughly a millennium of unforced climate variability - that is how we were told that 15-year hiatuses were rare and why everyone was predicting around 2010 that the current hiatus would have ended by now. Well, 15-year hiatuses are rare if ECS is 3 degC, but they certainly will be more frequent if ECS is 1.5 degC. And if you have tuned your AOGCM to reproduce the 20th century record of warming (which could have been enhance 50% by unforced variability), your model ECS will be 3.0 degC instead of 1.5 degC. This is precisely why Lorenz insisted it would be unacceptable to use AOGCMs that had been tuned to fit the historical record. When you read about at what we have learned from randomly perturbed parameter ensembles, you wonder about the arbitrariness in the process by which sophisticated models are tuned and the opportunities for biases to creep in. The correlation between high ECS and high sensitivity to aerosols is certainly supports suspicions that the historical record has somehow biased the tuning process.

Back to the difference between "statistical models" and "physical models". "Statistical models" look at the historical record alone to understand the significance of recent warming over the last few decades. We would need a historical record a millennium long, not a century long, to understand the significance of recent warming. "Physical models" attempt to assess significance by calculating how much warming we should have seen and how much unforced variability might have enhanced or suppressed the expected signal.

Mar 31, 2014 at 8:42 PM |

Frank

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>