
Testing two degrees


One of the questions I would have liked to ask at the Cambridge conference the other week related to a graph shown by John Mitchell, the former chief scientist at the Met Office. Although Mitchell did not make a great deal of it, I thought it was interesting and perhaps significant.
Mitchell was discussing model verification and showed his graph as evidence that they were performing well. This is it:
As you see, the data is actually derived from the work of Myles Allen at Oxford and examines how predictions he made in 2000 compare to outturn.
The match between prediction and outturn is striking, and indeed Mitchell was rather apologetic about just how good it is, but this is not what bothered me. What I found strange was that the prediction (recalibrated - but that's not the issue either) was compared to decadal averages in order to assess the models. As someone who is used to Lucia Liljegren's approach to model verification, I found the Allen et al method rather surprising.
The difference in assessment is obviously very different - Lucia is saying that the models are doing rather badly while Allen (Mitchell) et al are saying that they are doing fine. It seems to me that they cannot both be correct, but as a non-statistician I am not really in a position to say much about who is right. I have had some email correspondence with Myles Allen, who is quite certain that looking at sub-decadal intervals is meaningless. However, I have also read Matt Briggs' imprecations against smoothing time series, and his fulminations again smoothing them before calculating forecast skill.
We really ought to be able to agree on issues like this. So who is right?
Reader Comments (127)
@Rob B
Having in a past life worked for the NZ Met Service, I can tell you that every long haul flight is given a folder containing a customized and comprehensive briefing for exactly that flight. NZ Met Service supplies these to a number of airlines, not only Air NZ, for flights worldwide.
At the time I was working there (2004) they were working with an airline to open a route from Singapore to London, overflying the Himalayas, by providing them with detailed forecasts for winds and turbulence in the area. All previous flights on that route had taken a more indirect (longer) southerly path.
>They would really risk not carrying enough fule based on a weather forecast??
Absolutely, provided there are suitable places to divert to and refuel in the event of a fuel shortage. This happens routinely, and such diversions are rare in practice.
The idea that models use the law of physics is typical of the rhetoric that someone like Olivier Talagrand would use. Yet Pierre Morel, founder of the LMD lab debunked this misleading statement very convincingly in a conference a few years ago.
Of course for modelers, it is tempting to say they use the laws of physics and conveniently reply to any critic that they are therefore against the law of physics...
Richards Betts
A prediction that is failing up to now according to MSU satellite data:
anomalies sorted http://tinyurl.com/6faqca2
time series http://tinyurl.com/5sf3ur5
The prediction may be correct, of course, if you use heavily "homogenised" GISS data or reprocessed raobcore data, but never with the raw data.
By the way, HadCM3 is still flat in the past, even compared to secretive CRUTEM, we haven't heard a good explanation for that yet.
I am glad you mention the jet stream. That is something that NWP models are good at, up to about five days being optimistic. In the current state of GCM, claiming that because the forecast is reliable for tomorrow, they will it reliable inr 50 years time is pushing the limits a bit.
You know very well that we could build a long list of uncertainties in the GCM that by far exceeds that of the anthropogenic radiative forcing, in those circumstances you can not seriously expect that we take GCM predictions as valid.
Surface atmosphere radiative transfer calculations are computationally expensive, especially iterative solutions. Whatever approximation you use gives you an error that exceeds anthropogenic radiative forcing (from now on ARF, which is only about 1.7 ±1 W/m^2).
Snow and ice albedo parameterizations are awful in most GCM, you get the albedo wrong and the error is an order of magnitude bigger than the ARF
snow cover evolution are simulated in most GCM with prehistoric degree day approaches, errors are far,far bigger than ARF
water vapour content goes exponentially with temperature, all GCM get the wrong temperature, therefore the wrong water vapour saturation and therefore the wrong condensation, cloud formation, convection, CAPE, etc,etc. errors pile up now beyond anything useful.
and don't mention the oceans...
I could keep on going, but you know the list better than I do.
Patagon, I think you may be responding at cross-purposes to Richard. He was defending GCMs against Theo's earlier charge that they have no scientific value at all.
He advanced two arguments: 1. GCMs do make specific predictions that can in principle be falsified. You suggest that one such prediction, that he had mentioned, is on track to be falsified. If that happens, I sense that Richard will accept this - he is aware, as a careful scientist, of the limitations of the GCMs in terms of the physics they attempt to describe, the computational accuracy with which they do so, and the parameters for boundary conditions. 2. GCMs have at least some skill that is obvious to aeroplane navigators: over a very short period, they predict weather correctly enough for pilots to plan their flight path.
These are two parsimonious claims that I for one completely accept. It doesn't stop me doubting that the current GCMs get predictions of climate in 30 years wrong. And it doesn't stop me wondering - as you suggest - whether they are going to get a lot better at predicting such things in the near future.
Sorry - that should have been "... doesn't stop me doubting that the current GCMs get predictions of climate in 30 years right."
Jeremy,
I accept your argument to a point. I agree that GCM are valid scientific tools.
Richards states in his answer to Theo above:
"At longer timescales (decades) we are reasonably happy with projections of long-term global mean temperature, albeit with fairly large uncertainties, and are even also happy with some of the general long-term regional precipitation trends as long as we can establish that there are credible physical mechanisms behind them."
That is where my comment comes in. We cannot downplay GCM limitations and uncertainties in future climate projections because they work fine within a few days time horizon.
Patagon
On your plot comparing 2 HadCM3 simulations with CRUTEM, I think you have used HadCM3 runs driven only with anthropogenic forcings (GHGs and aerosols). Am I right?
If you look at runs which also include natural forcings (sun and volcanos) you will there is a better fit to the observed multi-decadal variability.
(Incidentally, if you also look at the runs with only natural forcings and not anthropogenic forcings, the fit is very poor in recent decades - the model does not reproduce the warming trend)
Jeremy Harvey
Thanks - yes I would indeed accept it if the statement I quoted above turned out not to be confirmed by observations. There is an ongoing process of model development and of course we aim to make ongoing improvements. And as you say this is a real challenge, but we think we're going in the right direction....
To come back to your earlier point about whether we could use the aerosol forcings to tune the historical simulations. Yes I guess in principle we could, but we don't - one reason is that we use standardised historical emissions datasets used by all models in the Coupled Model Intercomparison Process, so if everyone changed the emissions in order to get their models to agree with obs then it would not be possible to fairly intercompare the models (which is an importart part of understanding their strengths and weaknesses)
Funnily enough when we included the biophysical effects of land cover change in 20th century simulations with HadCM3 (well actually it's atmosphere-only version HadAM3) this actually made the simulation worse in some areas. It's a common problem with these models that when you think you've improved them in some way, it turns out worse overall (so yes there was clearly some compensation of errors going on). This is another reason why it's not worth trying to fiddle things, it's just too hard!
BTW HadCM3 is now quite old. Since then we've had HadGEM1 and HadGEM2, and are working on HadGEM3 with the specific aim of improving regional forecast capability.
Richard,
That only means that the models are incomplete, not that the prescribed anthropogenic forcing is correct, especially because that additional forcing is not only anthropogenic but requires substantial help from water vapour feedback. This value is also prescribed, as the model simulations of clouds and specific humidity distribution are not very realistic.
Given the many problems in the simulations of GCMs, a few of which I have highlighted before, we also need to rule out equifinality if we are going to trust the models output.
Patagon
Feedbacks (WV and others) are not "prescribed", they are emergent properties.
Don't get me wrong, I fully recognise that GCMs have lots of problems. I just think that they not completely worthless...:-) Clearly you disagree, and that's fine - I'm glad you think it's a worthwhile endeavour to try to make them more useful (although we disagree on the baseline for "more"!)
Richard,
"Feedbacks (WV and others) are not "prescribed", they are emergent properties."
Then they are selected, if you prefer, as water vapour brings an important negative feedback through cloud formation that is not very realistic in current GCMs
The chart I linked to before was the 20th century experiment 20C3M, I guess that should contain both natural and anthropogenic forcings.
Don't get me wrong either, I don't think they are worthless, I do a good deal of atmospheric modelling for work myself. It is just that we are not there yet.
I think Models are most useful when they fail, they indicate the points that we do not understand well, that is why I disagree with the popular and political worship of climate models, and to be honest, the majority of climate scientist are not very critical with this "worship" either.
HadGM3, are the output data available somewhere (apart from badc.nerc, which is restricted to UK academic research)?
Patagon
If you tell me the exact source for the data you used I'll check whether it was "ALL" or "ANTHRO".
You are right that there is not even criticism of the use of model output in blind faith. A fair amount of climate impacts studies do this - the desire to try to improve this the main reasons I've moved more into that area in recent years.
To bring in another HSI quote:
"Have we, as a community, become better in rejecting such claims? I'm afraid we have not" - Hans Von Storch, quoted on page 388.
Garth Paltridge also makes similar points. They are both right. Climate scientists should be more willing to challenge those who overstate the certainty of doom. We need to accept the large uncertainty and figure out how to deal with the non-zero risk - it's the differing approaches to that which lead to real or perceived allegencies and a tendency towards an unhelpfully polarized debate.
Sorry, in the above post I wrote "not even criticism" but I meant "not enough criticism"
Richard,
Glad to see you agree with Garth Paltridge, I fully recommend "The Climate Caper", that you may know already
The data come from the climate explorer:
climexp.knmi.nl > monthly scenario runs > UKMO HadCM3 > tas (2 runs)
sorry, it should be
climexp.knmi.nl > monthly scenario runs > UKMO HadCM3 > 20c3m > tas (2 runs)
So, I checked the HadCM3 runs provided to AR4 under A1B.
There is a huge amount of variability in this model - as much as I have seen in any of the models.
HadCM3 monthly - 1900 to 2100 vesus Hadcrut3 actuals.
http://img855.imageshack.us/img855/8677/hadcm32100.png
And then HadCM3 along with the multi-model means under FAR, TAR, and AR4 versus Hadcrut3 (starting at the time when the predictions were made).
http://img855.imageshack.us/img855/2673/hadcm3ipcc.png
Pretty hard to check HadCM3's reliability versus observations when the model anomalies are +/-0.6C within an individual year.
Richard Betts writes:
"My colleagues Smith et al (2007) made the very specific prediction that "at least half of the years after 2009 [up until 2014] are predicted to be warmer than 1998, the warmest year currently on record." Maybe I am lost in the nuances, but I don't understand why this cannot be tested and shown to be either right or wrong within the next few years."
This could be formulated as a statistical hypothesis. As such, conditions for its falsification could be specified. I am talking about objective probabilities, not Bayesian statistics which belong to game theory rather than science. Can you provide the statistical formulation and the conditions for falsification?
As it is written, it is simply a hunch. Ask yourself this question: if the hunch proved to be conclusively false because no years were warmer then what would be falsified besides the hunch? Nothing in my opinion. Yet when predictions are made from genuine hypotheses, which have been shown to be reasonably well confirmed and used in science, not only the predictions but the hypotheses themselves are falsified. They can no longer be used in science. A risk has been taken and a loss suffered, all for the gain of knowing that we were wrong. What risk is taken in your hunch and what scientific gain would be made if the hunch proved false?
"Also, testing the predictive capability on shorter timescales, are you flying transatlantic anytime soon? If so then keep your fingers crossed that our GCM can correctly forecast the position and speed of the jet stream, because your airline will be relying on it to decide where to route your plane and how much fuel to carry."
This is a wonderful example. My guess is that over a long period of time airlines and their consultants developed a highly sophisticated set of rules of thumb about the behavior of the jet stream. Your GCM enables you to embody all these rules of thumb in one neat system that is automatic. But you have not formulated your rules of thumb as hypotheses and your system is not science. Which is not to say that it is not important.
More later...
Hi Patagon,
I've had a dig around and it looks like I was right, you did use HadCM3 runs that only included anthropogenic forcings not natural forcings. However it is easy to see why you did not realise this, and naturally assumed you were using an "all forcings" run.
If you go to climexp.knmi.nl -> monthly scenario runs
then in the top paragraph click on PCMDI to reach:
http://www-pcmdi.llnl.gov/ipcc/about_ipcc.php
and then follow:
model documentation -> UKMO_HadCM3
and scroll down to section V and find 20C3M you will see the radiative forcings listed as:
"anthropogenic greenhouse gas forcing (as specified in the IPCC 1995 report, to give IS92a-like forcing variations, see Table 1a in Johns et al., 2003); + sulphate aerosol direct + indirect forcing (via calibrated delta-albedo; see Johns et al, 2003 Appendix 1A); sulphur chemistry without natural DMS & 3D SO2 background emissions, (ie. anthropogenic SO2 emissions surface and high level only, taken from a personal communication with Steve Smith and Nakicenovic et al 2000.). Tropospheric/stratospheric ozone (reconstruction for the period 1858 – 1970)."
So no solar or volcanic forcings, which is what would be needed to get a better fit early in the 20th Century.
However I think I, like you, would have naturally assumed at first that the runs labelled "20th Century" would be the ones that represent the best attempt to reproduce the 20th Century climate, ie: include all forcings including natural as well as anthropogenic (given that runs like this do exist).
So why did they only include the anthro-only runs here? Possible reasons could include the following (although I'm only guessing here - I wasn't directly involved in CMIP3):
(a) maybe not all modelling groups in the intercomparison exercise were set up to do all forcings, so for consistency all groups might have been asked to only include the forcings that all groups could do
(b) maybe they wanted it to be consistent with the future projections, which don't include natural forcings (volcanos cannot be predicted!)
(c) maybe it was viewed as a sensitivity study specifically historical anthropogenic forcings as opposed to an attempt to reproduce the 20th Century climate and do detection and attribution.
However as I say I don't really know, I wasn't involved in the intercomparison (I only helped build part of one of the models and then use it for our own studies).
It looks to me as the "all forcings" runs can be requested via BADC:
http://badc.nerc.ac.uk/view/badc.nerc.ac.uk__ATOM__dataent_12024019658225569
Although as someone hinted at earlier this may not be trivial. I realise this statement will be like a red rag to a bull on this blog so please don't shoot the messenger.... :-)
But anyway I think that at least partly answers your original question about why the agreement between the model and obs was so poor in terms of multi-decadal variability before the mid-20th Century.
BTW yes I've read Paltridge's The Climate Caper and quite enjoyed it. Like HSI there's lots of useful and interesting insight in there even if one doesn't agree with the whole thing.
Richard Betts writes:
continued...
"This is actually the same model used for climate projections, just run at shorter timescales. We are clearly happy with its skill in quite some detail on the timescale of days. At longer timescales (decades) we are reasonably happy with projections of long-term global mean temperature, albeit with fairly large uncertainties, and are even also happy with some of the general long-term regional precipitation trends as long as we can establish that there are credible physical mechanisms behind them. However as we have discussed there are also many cases where we are not happy with the skill of the model and have much more work to do. But we are at least now at a stage where we can attempt near-term (interannual) forecasts, evaluate them, and improve the model (and hindcasting is also important here)."
Your use of this model seems to be providing important services. It is valuable. However, you write: "are even also happy with some of the general long-term regional precipitation trends as long as we can establish that there are credible physical mechanisms behind them."
If you are going to call something a scientific prediction then you should have a physical mechanism behind it. Ideally, your hypotheses will describe the natural regularities that make up your physical mechanism and your predictions will be deduced from those hypotheses. What you predict will be seen as instances of the natural regularities that your hypotheses describe. It's just like the amateur astronomer directing his telescope at Venus and saying "Yep, she's in quarter phase, right on Kepler's program." Mendel did not understand mechanisms of heredity, that had to wait for Crick and Watson, but he developed genuine hypotheses based on Objective Probabilities. That is good science too. But the rest of what you describe is not science.
You write: "However as we have discussed there are also many cases where we are not happy with the skill of the model and have much more work to do." Good for you. You have the humility of the scientist. You are willing to say that there is much in the area of climate prediction that your GCM cannot do at this time.
The people who present themselves as leaders in the field of climate science, such as Hansen, Schmidt, and many others are absolutely unwilling to do this. They adamantly refuse to acknowledge that their models fail as substitutes for physical hypotheses in even one case. Paradoxically, they will then say that their models have been saved from falsification because they just discovered that China's output of aerosols now over-rides contributions from manmade CO2. Yet in saying this they admitted another failure, namely, their models did not account for China's output of aerosols. Climate science is not ready for prime time. It is in its infancy. But Hansen, Schmidt, and others adamantly deny this and give testimony to Congress in support of gazillions of taxpayer expenditures to fight something for which they have no scientific explanation. They are not practicing as scientists.
Patagon writes:
"Richard,
"Feedbacks (WV and others) are not "prescribed", they are emergent properties."
Then they are selected, if you prefer, as water vapour brings an important negative feedback through cloud formation that is not very realistic in current GCMs."
This is an absolutely crucial point. Computer models consist of two things: computer code and the numbers generated in a given run of the model (aka, a simulation). When you say that "feedbacks are emergent properties" what you should be saying is "I interpret these numbers as feedbacks." And what is the basis of the interpretation? It is your interest in the code, your interest in making sense of all the numbers generated, and your interest in feedbacks. None of these interests are made explicit in your model and, therefore, none are explicit in your science. Hansen and Schmidt practice with no awareness of this point at all. They believe that whatever they see in their model is hardcore, well-justified science.
Richard Betts,
I want to thank you for sharing some of your interests in GCMs and especially for your remarkable candor. Your candor puts you right at the top of the list for me. I have great respect for your work and your models. I managed computer models for many years. They are very valuable to corporations in situations where decisions must be made and science has not yet conquered the territory. They are valuable to science too, but analytically not synthetically.
Richard Betts writes:
"Funnily enough when we included the biophysical effects of land cover change in 20th century simulations with HadCM3 (well actually it's atmosphere-only version HadAM3) this actually made the simulation worse in some areas. It's a common problem with these models that when you think you've improved them in some way, it turns out worse overall (so yes there was clearly some compensation of errors going on). This is another reason why it's not worth trying to fiddle things, it's just too hard!"
From an insider's perspective, you have made my position as eloquently as I could have. Your models have no account of land use change. Don't you agree that they should have? Pielke, Sr., and I do. And if one is going to claim that his GCM embodies climate science today, as Hansen and Schmidt do, then is he not making a false claim? My conclusion is that advocacy of the CAGW position by some people rests on claims that they know to be false.
If everyone on the CAGW side, and I do not mean to place you there, spoke with your candor then there would be no debate about CAGW and no hysteria. We could all get on with the science. (Of course, there would no longer be a role for the IPCC.)
Thanks Theo, it's been good discussing this with you all.
I completely agree with you that the biogeophysical effects of land use change should be included in climate models and climate policy, in fact I was one of those pushing for that some years ago - one of my papers on this is here
I was responsible for assessing this in IPCC AR4 - I was a lead author on WG1 chapter 2 "Changes in atmospheric constituents and in radiative forcing", and wrote section 2.5 on "Anthropogenic Changes in Surface Albedo and the Surface Energy Budget"
I also contributed to chapters 7 and 9 on this topic.
And as it happens I have published with Roger Pielke Snr on this, in this paper in Phil Trans and also this paper submitted to Wiley reviews
It's worth noting that even though including land use change in the model was initially problematic, we didn't take it back out again - it is included in our runs for AR5 with the new model HadGEM2-ES (and indeed it is in most other models I believe). Will be interesting to see what happens with all this... :-)
Also, I meant to add that Jim Hansen was actually also one of the first to look at the effects of land use change in GCMs, see Hansen et al (1997) . Sadly though it took over a decade for this process to be included in the mainstream simulations used by IPCC for detection & attribution and future projections - I think because of the focus on processes which are regarded as important because of their relatively large global mean radiative forcing, rather than their overall effect on the climate system through other influences on the surface energy and moisture budgets (through which land use may make a particularly large contribution at regional scales, which is of course arguably more important than global mean temperature!)
Richard betts:
Unfortunately, as with so much supposedly very important data, the access is restricted to those pages.
Patagon
Yes, I thought that might be the case. You can request access though - have you tried? (Or will you?) I'd be interested to understand how much of a barrier this is to constructive development of the science.
Richard Betts writes:
"Sadly though it took over a decade for this process to be included in the mainstream simulations used by IPCC for detection & attribution and future projections - I think because of the focus on processes which are regarded as important because of their relatively large global mean radiative forcing, rather than their overall effect on the climate system through other influences on the surface energy and moisture budgets"
You are a wealth of information. My suspicion is that the leaders in using models for climate research are fixated on heat transports caused by radiation from the sun and care little for anything else. It strikes me that they are trying to come up with the definitive "Gaia Model" of radiation exchange in the Earth - Sun system. This fixation causes them to overlook any natural process that does not fit well in the Gaia Model or does not enhance the glory of the Gaia Model. Pardon me for gloating, but I think it is rather telling that Hansen seems to be retreating to the position that aerosols will save the Gaia Model from "falsification." I think the "Gaia Model approach" is "bass ackwards." If you do not have physical hypotheses that describe the natural regularities such as ENSO, then you really do not know the identify of the phenomena that make up your subject matter, and you have no idea about forcings and feedbacks because they exist only in the natural processes and apart from the phenomena of radiation. I think Hansen just admitted as much when he appealed to China's aerosols to bring his model into line with observed temperatures. I regret that I must gloat a bit more, not with regard to you, but with regard to Hansen. We have been told for years that the Gaia Modelers have a bead on the question of Earth's albedo. Then, when pushed into a corner, Hansen brings up the effect of China's aerosols to save his projections. Yet doing so states loudly and clearly that they had no bead on albedo. So, one wonders what is in the Gaia Model and what isn't. Is it just a matter of Hansen's convenience? And the code will not settle this because of modelers tendency to say such things as ENSO is an "emergent phenomenon." The beauty of having physical hypotheses is that you cannot fiddle with them in this manner.