BEST, bad, worse
Via Nic Lewis and Frank Bosse comes a link to a page at the Berkeley Earth Surface Temperature project (BEST). Richard Muller and his team have compared their results to the output of a series of GCMs and the results are not exactly pretty, as one of the headlines explains
Many models still struggle with overall warming; none replicate regional warming well.
As Nic explains in an email:
...the only GCM to match the BEST 20th century land trend accurately is inmcm4. It's almost spot on, whereas all the other models are out by at least 10% apart from GISS-E2-H, and far more in many cases, with the most sensitive models often showing the lowest increases (I suspect due to very high aerosol forcing and maybe low GHG forcing and/or high ocean heat uptake). The models that get the land trend most wrong are HadGEM2-ES and a model based on it, ACCESS-1.3, CSIRO-Mk3-6 and MIROC5 - all* highly sensitive models that underestimate the land trend by a factor of 2 of more - a factor of nearly 5 out in CSIRO-Mk3-6's case!
I'm unsure why BEST say that GISS-E2-H performs best for the overall trend: it is 2nd best after inmcm4 and far less accurate. Anyway, GISS-E2-H is only a bit more sensitive (ECS) than inmcm4: 2.3 vs 2.1 C, against a model average of 3.4-3.5 C.
Richard Muller once wrote a very successful book called Physics for Future Presidents; this is a man who has been at the sharp edge of the science/policy interface and at the very highest levels of government to boot.
So in the light of what BEST have found, I wonder what he would tell future presidents about GCMs. Do they have any role in the policy process at all?
*In a further email, Nic corrects himself, noting that although MIROC-ESM is the most sensitive model of all (ECS 4.7C),MIROC5 is not actually highly sensitive, having an ECS of ~2.9C.
Reader Comments (32)
Unbundling "the ensemble" is a long overdue step in the right direction.
today is #BackClimateAction, a governmental push in the UK for climate related policy, recruiting scientists to do propaganda for their paymasters (well, that's how I see it - they might claim otherwise).
I have asked Richard B for the umpteenth time what use his regional climate models to guide UK climate policy when there is no evidence any regional climate model has any skill, or is going to have any skill in the foreseeable future.
I shall report the answer (ahr ahr)
The quantity usually compared is the ANNUAL MEAN GLOBAL temperature, a truly awful single-number per year statistic from which many ludicrous headlines are generated.
ANNUAL: Nothing perceives an annual average, much more revealing are seasonal averages. It only takes milder winters to produce headlines like "Hottest Year Evah", like yeah, milder winters are really scary.
MEAN: Nothing perceives the mean temperature during a day, most people only care about the maximum, a few care about the minimum. It only takes warmer nights to produce the headline above, like yeah, warmer nights are really scary.
GLOBAL: Nothing perceives the global average temperature. It only takes a bit of regional warming to produce the headline above, like yeah, warmer (but still frozen) polar regions are really scary.
"So in the light of what BEST have found, I wonder what he would tell future presidents about GCMs. Do they have any role in the policy process at all?"
I think Richard Muller should put this work forward as a parallel session topic at the Our Common Future Under Climate Change Conference. The deadline is the end of this week:
http://www.commonfuture-paris2015.org/
How "Our Common Future" looks should depend on the quality of the foundations...
Some US presidents are reported to have consulted psychics. But maybe their prophesies were better than those of GCMs?
"NASA's "GISS-E2-H" model appears to replicate overall warming best, but is among the worst models at replicating regional trends. China's "FGOALS-g2" appears to be the best performer at replicating regional trends, but still over/under-predicts the warming trend by about 0.5C per century almost everywhere."
Well a sensible person would be more inclined to agree with a model that gets local temperatures more correct rather than a model (GISS) which is totally crap for regions and achieves the global result correctly only by utterly roasting the Arctic. Of course the BEST dataset is a complete misnomer - unrtil they reconcile the satellite data that everyone knows are the best available recons since 1979 then imo Berkely Earth have the WORST recon. If you truly have faith in models then you must surely suspect the temperature reconstructions; 70% of which come from sea surface temps that were largely guessed prior to 2003.
In his continuing swings between 'I was a sceptic' and 'I was never a sceptic', Dick Muller no doubt wants to keep his options open for when everyone realises that the GCMs are next to worthless.
But, but, Stephen Mosher has been telling us for years that the models are the BEST we have, therefore we should use them.
Right, like germ-infested barbers' razors used to be the best we had, so using one of them to cut into us was the best option. When in fact we would have lived longer and almost certainly more comfortably (with a bit of laudanum) if we had done nothing.
In Figure9, many of the models stop at year 2000. The model average after this time (2000 to 2010) appears to be based upon very few. Is something being covered-up here?
Mikky puts his finger on a very important point - one which Lindzen has mentioned form time to time. Since the mild overall warming comes almost entirely from warmer winters in the North - mainly Siberia - then who cares as it is obviously nothing but beneficial? Greenland of course is still only as warm as it was in the 1930's so all they are really worrying about is that Siberians, Inuits and most Arctic animals are finding life a bit easier (yes even Polar Bears who eat almost anything and are able to vary their diet a bit - even venturing into towns where the real danger for them lies).
Endorsing JamesG endorsing Mikey,
If Siberia has a long term trend of warmer winter nights, ceteris paribis, the globe is warming, right?
But catastrophe? I don't see it.
Thalidomide was thought to be the best we had at controlling morning sickness. There may even have been a consensus.
Oh dear.
Indeed, thalidomide is a very efficacious drug, and not just for morning sickness. It was the perversion of science that was and is the problem.
@johanna; Totally correct, thalidomide was developed as an anti-depressant (and much more effective than the existing ones at that time).
Its off-license use for morning sickness in pregnancy was largely due to lazy medical professionals who, without the appropriate training in pharmacology and toxicology, saw it as a cheap to deal with what was ,for them, an inconvenient problem without realising the risks they were exposing their patients to.
It's like the 'climate scientists' who expound the CAGW religion but have no knowledge of the basic sciences that should be the foundation of climate science. And, btw, thalidomide is still in use as an effective anti-cancer and anti-HIV drug.
Please go and read some of the literature on Thalidomide before making further incorrect comments. Its use for nausea, for which it was extremely useful had nothing to do with incompetent or lazy scientists, but because initially it was a better drug than those currently available. It was incredibly safe because the toxic dose was so high that it was not possible to commit suicide by overdosing, which was a frequent occurrence with alternative drugs..
Its teratogenic properties were unknown until deformed children were born. At this point terrible mistakes were made and rapacious drug companies that were reluctant to forgo income were largely to blame, but this had to do with commerce and little to do with science. The Thalidomide disaster could not have been forseen, or predicted. It was a Black Swan event. However the effects could have been minimised if action had been taken sooner.
A similar drug would not now make it to market
@Arthur: Sorry but your comments are not correct, I worked in reproductive and developmental toxicology for over 20 years. My comments are accurate and true. Your comment that 'the thalidomide could not have been forseen, or predicted' is rubbish - it was not developed as an anti-nauseant, and its use as such (by doctors, not scientists) was off-license.
Also, please, be aware that the automatic requirement to test new drugs in two species for teratogenic effects in two mammalian species, which was introduced as a result of thalidomide, was dropped almost a decade ago when an international review found it unnecessary.
Omnologos
Maybe you've not got round to it, but just in case you've forgotten your promise to report my answer (which came very soon after your post above), it is here.
I said:
But anyway you didn't actually ask the question the way you put it above - you just said:
I don't think RCMs are particularly used by DECC for climate mitigation policy. They are however used for the advice provided for Defra for adaptation - this is via UKCP09 and the Climate Change Risk Assessment. But it's not just one model - a large ensemble is used, to help assess the range of uncertainty.
Nic Lewis,
The BEST / CMIP5 comparison is very interesting.
However, where did you find the HadGEM2-ES comparison? I can only find HadGEM2-AO (which is the Atmosphere-Ocean model without the Earth System components that HadGEM2-ES has).
Bishop Hill,
How does Nic's remark about 'models that underestimate the land trend by a factor of 2 of more' square with one of your favourite phrases, that 'the models are running too hot'?
NASA's "GISS-E2-H" model appears to replicate overall warming best
thanks to the dead hand of DR Doom , meanwhile there continues to be no statistically signification warming , despite the models claiming their must be , while the 101 excuses piled up by alarmist go bust one by one. And the claims 'its worse than we thought ' look madder by the day.
Richard Betts:
"I don't think RCMs are particularly used by DECC for climate mitigation policy. They are however used for the advice provided for Defra for adaptation - this is via UKCP09 and the Climate Change Risk Assessment. But it's not just one model - a large ensemble is used, to help assess the range of uncertainty."
Please can you provide references in support of this statement, including any benchmarking reports relevant to the uk. I have looked at the ukcp09 site but it is not clear how these models are assessed against reality rather than by using the discrepancies to values generated by the "Multi Model Ensemble". Thank you.
http://ukclimateprojections.metoffice.gov.uk/22530
(As a btw - the YouTube video on "the science behind the projections" shows at "not available" despite several attempts)
Richard, I take it you fully endorse Julia Slingo's statement about last year's Someset floods that "the evidence suggests there is a link to climate change"
http://www.bbc.co.uk/news/uk-politics-26084625
Yet according to the Met Office's own records, this is not the case.
http://notalotofpeopleknowthat.wordpress.com/2014/11/23/rainfall-patterns-in-the-south-west/
Or was it "models" that were informing her grossly misleading statement?
I take it our RCMs can tell how tall our grandchildren will be?
Another Side Effect of Climate Change and El Niño Events? Shorter Kids.
Richard Betts,
I initially misread the HadGEM2 version shown by BEST, but I had subsequently checked that my statement was valid . I calculated the 1900-1999 annual global mean trend as 0.0174 C/decade for HadGEM2-ES, against 0.0372 for HadGEM2-AO. The global trend per BEST is 0.068 C/decade; HadCRUT4 is slightly lower at 0.063. Whilst these are for global not land trends, it seems inconceivable that the HadGEM2-ES version could have a higher land trend than the AO version when its global trend is under half that of the AO version.
I can also answer your query to BH. The "models are running too hot" is a phrase usually applied to the satellite era (1979 on). Over 1979 to 2013, or even 1979-2005, most of the models – including HadGEM2-AO and -ES – do run too hot. But over 1900-1979 HadGEM2-AO and -ES run far too cool, warming globally at only 0.021 and 0.0065 C/decade respectively, against 0.060 and 0.055 per BEST and HadCRUT4 respectively. That is more important to the 1900-1999 trend than is the over fast warming post 1979. (I suspect excessively negative aerosol forcing is principally to blame for the inadequate warming up to 1979.)
I have a couple of questions in return:
Can you explain why the AO and ES versions of HadGEM2, and the CC version (which I see now has an even lower 20th century global trend of 0.0066 C/decade), all produce very different 20th century global trends in the Historical experiment? As this involved greenhouse gas concentrations rather than emissions being specified, aren't the main carbon cycle etc. feedbacks of the ES and CC versions inoperative?
Can you also explain why the Met Office archives monthly CMIP5 data in a uniquely non-standard format? I refer to the data files mainly ending in November, but - depending on the experiment - with some ending in December. Those for models from all other groups all consistently end in December. Moreover, unlike multiple data files for models from all other groups, which are contiguous from one time segment to the next without overlap, one or two of the Met Office data files start at earlier date than when the previous file ends. And one HadGEM2-ES data file has only a single month's data, unlike any other model file. Why? These abnormalities make processing of CMIP5 data time consuming, prone to error and difficult to automate. Why can't the Met Office, as a leader, set a good example rather than being a law unto itself and showing no regard for data users? Providing a single large netcdf file for each simulation would avoid difficulties in future.
So have I got this right?
The models predicted a low-lattitude tropospheric hotspot phenomena where the effect of radiative forcing should be greatest in terms of energy. They found substantially less than expected.
The models also predicted a polar phenomena where the effect of radiative forcing should be greatest in terms of temperature. They found substantially less than expected.
So it seems the models got it wrong in the hot regions of the world and the models got it wrong in the cold regions of the world. Is there a single reason for them all to be wrong in the same direction? Is it a coincidence that the models got it wrong in these regions most affected by phase changes in water. Time for some new models?
Relating to JamesG at 12:42 regarding Siberia:
Some years ago there was a story (sorry, no reference) of temperature readings from Siberia being reported with a cold bias. The idea was that by reporting colder, rather than actual, temperatures there would be more fuel sent to the outposts. Then when fuel became a non-issue there was no need to low-ball the temperature. Further, a related story was that many of the smaller cold-reporting places dropped out of the network. Taken together the story concluded that the warming was created artificially and not real.
Can anyone comment on this? Thanks.
"So in the light of what BEST have found, I wonder what he would tell future presidents about GCMs. Do they have any role in the policy process at all?"
He doesn't think very highly of models. ahem.
The comparison with models wasn't really complete. Rohde was working on a couple things: First some very long data sets that went back to before 1750, and then the current stuff. At the time I was working on Regional models for the US looking at the heat wave question. More on that if folks want. the results are in a students project for the UHI group. expected mortality etc..
Nic wrote me about the inconsistencies and I don't have an answer as the project was dropped to focus on addressing some of the issues with the accuracy at small scales--Google project needed resolution down to 1km and some of my regional work wanted data at a finer resolution than 1degree. Since we do a fit at large spatial scales the question always becomes does that process over smooth? Any way the comparison with GCMs was back burnered. I'd be happy to spin it up again. My goal was to select the best GCM. but to go about it in a way that hasnt been done before.
And then I'd look at ECS for those models.
You have to be careful because not all the models use all the forcings. Does that surprise you?
Some models are flat out wrong. that is they have results which appear non physical. An example would be a model
where one grid cooled and its surrounding grids warmed. While we know warming doesnt have to be homogenous, we can be pretty confident there arent mysterious places on the planet that cool dramatically while the neighbors all warm.. hmm hard to make sense of that from a thermo perspective.
"So in the light of what BEST have found, I wonder what he would tell future presidents about GCMs. Do they have any role in the policy process at all?"
He doesn't think very highly of models. ahem.
The comparison with models wasn't really complete. Rohde was working on a couple things: First some very long data sets that went back to before 1750, and then the current stuff. At the time I was working on Regional models for the US looking at the heat wave question. More on that if folks want. the results are in a students project for the UHI group. expected mortality etc..
Nic wrote me about the inconsistencies and I don't have an answer as the project was dropped to focus on addressing some of the issues with the accuracy at small scales--Google project needed resolution down to 1km and some of my regional work wanted data at a finer resolution than 1degree. Since we do a fit at large spatial scales the question always becomes does that process over smooth? Any way the comparison with GCMs was back burnered. I'd be happy to spin it up again. My goal was to select the best GCM. but to go about it in a way that hasnt been done before.
And then I'd look at ECS for those models.
You have to be careful because not all the models use all the forcings. Does that surprise you?
Some models are flat out wrong. that is they have results which appear non physical. An example would be a model
where one grid cooled and its surrounding grids warmed. While we know warming doesnt have to be homogenous, we can be pretty confident there arent mysterious places on the planet that cool dramatically while the neighbors all warm.. hmm hard to make sense of that from a thermo perspective.
"But, but, Stephen Mosher has been telling us for years that the models are the BEST we have, therefore we should use them."
A subtle misunderstanding on your part. I take it as a given that policy makers are interested in making policy.
Let us say for example one is interested in setting a policy on land development in flood prone areas.
A policy maker has every right to use whatever information they damn well please to use. They can use the information given to them by developers, from residents, from fortune tellers, from their party leader. They are
free to use any and all of this and weigh it in whatever haphazard or 'rational' way they want. They are also free to ask scientists for their help informing the policy. A scientist who says " I dunno, I have no clue where sea levels will be" can be ignored. A scientist who wants to model future sea level as a simple extrapolation of the past is also welcomed to provide that information. And if you ask for the best science, you're probably going to be pointed to a GCM or RCM.
I'm not in a position to say what the policy maker should do with all these inputs. They will do what they want to do. They will weigh the information as they see fit. If they ask me which source is best, I'll tell them that an RCM will probably be better than an extrapolation from the past, but please check that. And further that they can also use the results as guidance and not definitive answers. The model may predict 1m, but that doesnt determine in any logical fashion what the policy should be. Its just data and nothing follows from it until you apply a value judgement.
Next:
"Of course the BEST dataset is a complete misnomer - unrtil they reconcile the satellite data that everyone knows are the best available recons since 1979 then imo Berkely Earth have the WORST recon. If you truly have faith in models then you must surely suspect the temperature reconstructions; 70% of which come from sea surface temps that were largely guessed prior to 2003."
Obviously this writer hasnt read the guys who produce datasets from brightness at the sensor. Mears of RSS is on record stating that land records are more accurate. A simple look at the history of adjustments in the sat records should give you pause. Also check the error bars. And if you really want to get down and dirty we can discuss the ATBD of every data source. Short version here. A sensor in a satillite measures brightness at a pixel. well actually
voltage at pixel. This is turned into a brightness AT the sensor. Now comes the problem. To get from brigthness at the sensor to temperature at a given height in the atmosphere you have to make a boatload of assumptions and you have to run a radiative transfer model to backout the effects of atmospheric gases on the transmission of the energy that arrives at the sensor. In short, to trust the output of RSS or UAH you have to trust radiative physics. If you question radiative physics, then you cant trust the numbers of UAH or RSS.
All that said, we have some cool comparisons with RSS and UAH that we presented back at AGU. At that time I asked UAH if they could produce higher spatial resolution since the core sat records come at 25km ( as I recall) but Roy said that wasnt something he wanted to do. Hmm. So I switched to doing comparisons with AIRS which gives me air temps at the surface. much better test than comparing TLT to the surface. we do pretty well against AIRS.
Ongoing project is comparing to MODIS LST since 2000. Terabytes of daily data. dont expect quick answers.
"A policy maker has every right to use whatever information they damn well please to use. They can use the information given to them by developers, from residents, from fortune tellers, from their party leader. "
Nope - subtle misunderstanding on your part. Try reading here for example:
http://www.publications.parliament.uk/pa/cm201012/cmcode/1885/188502.htm#a4
Next:
"If they ask me which source is best, I'll tell them that an RCM will probably be better than an extrapolation from the past, but please check that."
What evidence would you present to support your "probably" and where would you suggest they check? Can you provide any benchmarking references for RCM outputs?
Steve Mosher,
Recently at WUWT you quoted him as saying they were more "reliable". I asked you for clarity of meaning, ("Accurate? Precise? Representative?") because your reference didn't include the word reliable. You gave me abuse in reply.
Now you quote Mears as saying they are more "accurate", which is one of the definitions I asked you about.
It is not clear to me how, and why, you are asserting that the land-based records should be considered superior to the satellite records, if I interpret you correctly. You are making it hard for me to even ask the question. I begin to wonder why you won't be more precise.
Nov 25, 2014 at 10:44 PM | Unregistered CommenterDon Keiller
Don, you could have added that two recent court cases undermine Dame Julia's hypothesis even further!
Re: BEST - at WUWT, Willis has done some interesting work on buoy temperatures which is also awaiting some clarification from Steven.