- Bishop Hill blog - Maslin's morass

Thursday

Nov062014

Bishop Hill

Maslin's morass

Nov 6, 2014

Climate: Surface

Climate: WG2

Professor Mark Maslin, a climatologist from University College London, has written an article for The Conversation to mark the publication of the Synthesis Report of AR5. In it he makes some remarkable claims, for example:

We have tracked significant increase in global temperatures of 0.85°C and sea level rise of 20cm over the past century.

(Don't think so)

Changes in precipitation are also expected to vary from place to place. In the high-latitude regions (central and northern regions of Europe, Asia and North America) the year-round average precipitation is projected to increase, while in most sub-tropical land regions it is projected to decrease by as much as 20%, increasing the risk of drought.

(Don't think so)

I wonder if he is going to try to make a defence of his article. If you head over to the Conversation, do stay polite and on topic. Several BH regulars are already there.

90 comments

View Printer Friendly Version

Reader Comments (90)

Maslin has responded to some of the comments.

In response to me he has used the pathetic tobacco company smear.
I have responded, but I expect my comment will soon be deleted.

On a "Conversation" with Lewandowsky & Pancost, a lot of comments have been deleted and the thread has been closed.

Nov 7, 2014 at 8:47 AM |

Paul Matthews

Nov 6, 2014 at 2:34 PM | Richard Betts

It seems that in your apparent attempt to divert from the issue at hand, i.e. Maslin's claims - without any specific evidence to substantiate your hand-waving, you point your accusatory finger at others.

The mileage of others may certainly vary; however, I don't find (what increasingly seems to be your pattern of irrelevant) diversions from the primary matter at hand to be particularly helpful or informative.

Perhaps instead of exercising your (IMHO) far too frequent - and admitted - practice of "skimming" you could put on your dedicated IPCC Lead Author hat and pay greater attention to the actual subject at hand, in this particular instance, Maslin's claims.

But perhaps that's too much to ask and/or expect of a representative of the "jewel in the crown, of British science and global science".

Nov 7, 2014 at 9:18 AM |

Hilary Ostrov

Another point about the constant citation of "20th century warming".

How exaclty does the increase in CO2 concentration that only became material in the second half of the 20th century cause the rise in temperatures that occurred in the first half oft he 20th century?

Some sort of climatological equivalent of rational expectations or somethnig??

Nov 7, 2014 at 10:00 AM |

Geckko

Richard ( nOV 6, 5:11 pm) ), you claim 'Of course they are inconsistent. Either you think that the observations tell us something (like Nic Lewis) or you don't (like Doug Keenan).
You can't have it both ways!'

But this is a false dichotomy according to my take of the situation. It is this. Keenan has merely observed that in the absence of a credible statistical model for a series of observations, there is no way to establish statistical significance for some observed change. This does not mean that the observed change is not statistically significant – it might well be, but we haven't figured out how to establish that using conventional methods. Thus, Keenan is not claiming that the observations do not tell us anything. At most, he is noting that the variability of past observations is complex, and that care is required. We might for example, tentatively assume that the change we have observed does indeed reflect a process change and proceed from there. We ought not to forget that we are being tentative, and that ought to encourage research into further analysis. The tentative nature of assumption about the observed change should also be made prominent in any summary report for, for example, 'policy makers'. I do not think the IPCC have done this, and I incline to explaining this as due to bias on their part. Political bias.

Lewis does not address the statistical significance of recent changes in the temperature time series, but he does address methodological questions in how that series has been used, most notably with regard to Bayesian analysis. He drew attention to a use of uniform priors which essentially led to higher limits on climate sensitivity for that reason alone. This was an elementary statistical mistake which once spotted ought to have been corrected in some prominent fashion, not least to recipients/participants in summary reports for policy makers. I do not think the IPCC have done this, and I incline to explaining this as due to bias on their part. Scientific bias.

Now I am not a professional in climate matters – it is a part-time study for me, and I may well have misconstrued either or both of these aspects of the work of Keenan and Lewis, or perhaps you think I should give more attention to other aspects. In either case, I would very much appreciate (and benefit from) being corrected.

Nov 7, 2014 at 11:08 AM |

John Shade

Only in loser science would someone dismiss the concept of 'statistical significance'.

Nov 7, 2014 at 11:09 AM |

hunter

Dear Micky H Corbett,

Thanks for the mention.

I'm certainly interested in uncertainty of temperature measurements (and estimates of global temperature) and many other authors have been over the years. The start and end point for all these analyses has always been (and always needs to be) the real world and actual measurements of the real world.

Statements about the accuracy of measurements are, as you note, theoretical - all statements are - but the same is true for your rule of thumb estimate of 0.5 to 1 degree. We should believe statements only in so far as they agree with the actual data.

For example, someone could assert that the noise in a particular sensor was 10 degrees, but if I look at measurements from that sensor and the standard deviation is only 0.1 degrees, I would be inclined to believe that the assertion was incorrect, or at least incomplete. The point is that statements about errors in measurements have observable consequences in the real world and we can, we should and we do, test them.

Your caveat is therefore incomplete, it really ought to say:
"The estimated uncertainties in the temperature anomaly are theoretical and based on a number of assumptions which have been repeatedly tested using actual data over the years and found to be sound. Nevertheless, one should avoid stepping on the toes of philosophers by allowing that, at most, they have not been found to be unsound and that there is - there always is - a small possibility that something has been overlooked."

If you do have an analysis that comes to a contrary conclusion, I'd be genuinely interested in reading it. I'm also interested in the engineer's perspective on this. It gets mentioned in discussions from time to time, but no one ever gets into the details.

I'm interested in the details and in the data, not so much in rules of thumb.

John

Nov 7, 2014 at 11:11 AM |

John Kennedy

Hi John

When you make a temperature measurement, having a very accurate thermometer helps a lot to obtain the instantaneous measurement from the point of view of the thermometer. The problem is that this point of view may not be the same as the environment you measure.

Here's an example:

Thermocouples are used in a lot of engineering and science experiments or verification tests. I ran a test on a plasma engine (a T6 ion thruster) which when operating is dumping about 5 kW of electrical energy into the plasma (I worked for nearly 10 years as a propulsion scientist). This heats the metal casings, causing wires and connectors to get hotter.

What is standard practice is to place thermocouples on various locations to get an idea of the temperature profile on the device, and how this relates to temperature sensitive components. Things like high voltage wire casings that are vacuum compatible but not indestructable. The thermocouples are usually stated to be accurate to around 0.1 degree but as the temperature gets higher it goes up. This level of accuracy is more than we need - typically within a degree is good.

We perform the test many times and notice that there is a spread of data. We can control the thruster environment very well using algorithms and feedback. We can try to minimise the measurement conditions. Yet we often see a spread of temperatures with differences in the range of 5 or 10 degrees (at worst). Typically 3 to 4 degrees.

The interesting thing is that you often see a variation in the same thermocouple over a long duration test.

From characterising and investigating it was found that thermocouple placement techniques (using Kapton or potting) adds uncertainty. Rather than try and hypothesise or solve what becomes a very intensive problem, the uncertainty is increased. This is partially because of time and cost, partially because in space you only get one shot, so we tend to err on the side of caution rather than be over optimistic; and the realisation that we may be reaching a theoretical limit with repeatbility. Plus we had already employed NPL to build us a thrust balance with cutting edge resolution and another impossible problem might be too much!

We realised that because the environment in the vicinity of the thermocouple has intrinsic uncertainty and even though we can control the temperature environment by controlling the plasma, a temperature accuracy of 0.1 degrees would be a stretch.And just to reinforce the point: that's when we can control the environment to a very high degree.

We also make the same measurements the same way each time with the same equipment, thus reducing error.

So this is what I'm saying about temperature measurements - they are supposed to accurately reflect the environment in which they are placed but to do so requires rigorously sticking to the same method, controlling and minimising outside factors, unknown factors (as your work often goes into) and ensuring consistency over the whole data set.

An easier and logically consistent approach is to realise an accuracy of 0.1 degrees cannot be achieved in the real world, as most if not all engineers engineers and experimental scientists would tell you.

You can present theory with assumptions that have been tested (although you need to watch the Central Limit Theorem ) but you must be sure that it also includes consideration of metrology and not just of what is theorised to be the underlying physical behaviour. Or that averaging techniques have included influences that you don't know about.

Of course the problem is that temperature anomalies become less useful. But you can't fight with experimental reality.

Nov 7, 2014 at 12:29 PM |

Micky H Corbett

Hi Micky H Corbett,

That's interesting and I understand the general point, but your post is about plasma and generalities, not how they relate to the specific problem of estimating uncertainty in historical and contemporary air or sea-surface temperature measurements.

Like I said, I'm interested in the details and in the data, not so much in rules of thumb.

John

Nov 7, 2014 at 1:43 PM |

John Kennedy

John

Very true, and a lot of the data collected was during acceptance and qualification tests. So getting specifics about that would require going back and getting that data. Considering I don't work from that company anymore it would be hard.

However my point was that even in the best controlled circumstances, it is hard to achieve the accuracy of the measuring instrument. And this can often be because of unknowns in the environment.

So what do you do? You consider repeatability as opposed to theory. If you had to stake your life on that result would you rather if had more error but was repeatable by multiple methods and variations in environment (in other words something that passed environmental qualification) or would you choose theory that could only be approached under exacting and specialised conditions?

You could do tests on actual temperature sites, check calibrations, in fact you could talk to NPL about it (I still think Dave Hindley works there - you could talk to him about setting up a campaign) to produce characterisation data moving forward. But for previous data it's going to be hard.

Or you could use a rule of thumb. Nobody would fault you for that.

But if you need 0.1 degree accuracy to justify using temperature anomalies, the answer isn't found by massaging data. It's found by pointing out the obvious: the measurement approach wasn't designed to give you that accuracy.

As a last point, the measurement techniques we used on ion thrusters and the error analysis we did (stating where we realistically could not reduce uncertainty but mostly taking the worst case) lead to the delivery of the GOCE mission ion propulsion system. The very same one that was used on the satellite to counteract drag so that the gradiometer on board could map the Geoid to the desired accuracy. The Geoid that was subsequently used to estimate ice loss in Antarctica, an indicator of climate change.

If we didn't apply this very realistic approach, all subsequent data would be further compromised. The quality of the Geoid data is a testament to the quality of the engineering. That's also why I don't envy your task but I realise that knock on effects that presenting data can have.

Nov 7, 2014 at 2:04 PM |

Micky H Corbett

[Snip - venting]

Nov 7, 2014 at 3:34 PM |

Bernd Felsche

Dear Micky H Corbett,

NPL and metrologists are involved in the design of new measurement systems. They're also getting more involved in the characterisation of the old observing system too. As you say "for previous data it's going to be hard". Hard questions are interesting and I'd like metrologists, engineers and statisticians to get more involved in understanding these problems, or at least sharing their expertise.

So, how accurate is the historical observing system actually?

You state that "at best it's more like 0.5 to 1 degree". Where does these numbers come from?

John

Nov 7, 2014 at 5:27 PM |

John Kennedy

Statistical significance suddenly became irrelevant when the lack of warming became statistically significant.
Just another example of how the warmist dogma squirms and shifts in order to deal with inconvenient realities.
It's like the way the pause always has to be about 5 years longer than it is at any given point for it to be significant.

Nov 7, 2014 at 6:31 PM |

Mickey H Corbett

The different global temperature datasets, GISS, NCDC, HadCRUT4, and BEST share a common basis in recorded temperatures from station, ships, etc.

They use different methods to compile the global figures, but agree that the confidence limits for their global temperatures are about +/- 0.1C.

On what basis do you state that the confidence limits may be +/-1C?

Nov 7, 2014 at 7:12 PM |

Entropic man

not banned yet, hunter,

See here by a climate scientist:

If somebody asks if something is statistically significant, they probably don’t know what it means.

and here by a sceptic:

It is time, now, right this minute, for the horrid term statistical significance to die, die, die … Nobody ever remembers what it means, and, with rare exceptions, almost everybody who uses it gets it wrong.

Nov 7, 2014 at 7:33 PM |

Richard Betts

On what basis do I say that the temperature measurements are accurate in reality to 0.5 to 1 degree?

Well, having spent 10 years doing it and measuring it. And also watching other experienced engineers and scientists. There are also international standards for qualification and acceptance like ESA standards and RTCA documents (DO-160) comes to mind.

But it also comes from the measurement systems themselves. Analysis of older measurements systems may help but the only way to really get an idea is to recreate the methods and see what types of variation you get. So do I have proper evidence that these errors are less or more for say buckets? No. Do you? I haven't seen that. I've only seen reanalysis and fitting modelled data fields. They have their place but you need to be making measurements and doing characterisation. It's good work, I'm not dissing it. It's just not enough to validate their use.

For example, John, you even state in some of your papers that bucket measurements have an estimated error of 1 to 1.5 K. That could be repeated easily and to be honest it feels like the right ballpark. Maybe larger if you were conservative. But there is no way that the bucket was designed to provide 0.1 degrees. The operators arguably had good technique but where they told that you had to get the accuracy to a certain number? Which means all bucket measurements have that error and you lose the 1940s to noise.

That alone should give you pause to think.

Take modern measurements. People can often pooh pooh Anthony Watts work in simply auditing measurement sites in the US but the fact is that this is exactly what needs to be done. If anything it shows what can be done if people focus on empirical characterisation. I can do all the theory I want but I've seen with my own eyes temperature variations on the microscale even under well-controlled conditions.

So that's the reason I ask the question and make the statement. If I had to express my scientific and engineering opinion I'd say about 0.5 to 1 degrees is probably a good estimate. I think others on here would agree with that at least as a reasonable ballpark.

I'm happy to get involved with doing experiments on this by the way. I think it would be very useful.

To be honest as well, it feels like an exercise in data mining is going on in order for a reasonable estimate can be made so that temperature anomalies have some use. But that's not a reason to present this data as anymore than an interesting exercise.

I'll continue to read some of your papers in the meantime John.

Nov 7, 2014 at 7:39 PM |

Micky H Corbett

Dear Micky H Corbett,

Again, that's interesting, but you are, once again, only expressing an opinion. It may be a well founded opinion (in fact, I'm sure it is given your experience) but it is only an opinion about what the value should be. I'm sure if asked by an engineer about the uncertainty in a particular measurement they would never let you get away with your statements:

"If I had to express my scientific and engineering opinion I'd say about 0.5 to 1 degrees is probably a good estimate. I think others on here would agree with that at least as a reasonable ballpark."

Where does this number actually come from? Did you sit down and calculate it, or is it simply a feeling?

John

Nov 7, 2014 at 8:10 PM |

John Kennedy

John

That estimate comes from applying a bit of common sense as well as experience.

If I was asked by an engineer about how accurate a temperature measurement was using a thermocouple placement, I would answer as a ballpark it'll be within 2 degrees, based on standard charactersation practice and typically what qualification specifications state. I'd then go about testing that but I'd know it wouldn't be 10 degrees out, as my placement would be bad - or 0.2 degrees as that accuracy is a little too good for realistic situations. I'd try and get it down if temperature was a problem but it's like asking me what is a good safe speed to drive your car around a typical street. As a ballpark, order of magnitude if you like, I don't need to be an engineer to say it's around 20 miles an hour. It's not 80.

If I was asked how accurate a pyrometer measurement was, it'd have a lot more variation. I think off the top of my head it's about 50 degrees.

That's all.

Nov 7, 2014 at 9:03 PM |

Micky H Corbett

The best liquid thermometers are calibrated to an accuracy of 0.1C and cost £200. Cheaper instruments may be accurate to 1C. In fact, accuracy is less important to the trend analysis than resolution and reliability. A particular station thermometer may read 1C low, but will generate readings with a resolution of 0.1C for years. Over a large number of stations individual inaccuracies will balance out.

The global average derived from these readings may represent the actual global temperature to +/-1C, but anomaly data used to study the trend will still have 0.1C resolution. Since it is the trend we are interested in, this is sufficient.

The confidence limits are calculated from the variability of the data. Since the different datasets are calculated from the same pool of observations, the confidence limits tend to be similar. For the early measurements the limits are +/-0.1C for GISS and HadCRUt4. For recent data GISS show about +/- 0.06C.

Nov 7, 2014 at 11:34 PM |

Entropic man

A brighter iris gleams in the eye of the burnished thermometer.
==========

Nov 8, 2014 at 6:38 AM |

kim

If a particular station reads "low" how did you determine that? You would have needed to calibrate against a source.
So how many times did you calibrate? Did the sensor have discontinuous drift or was a gradual variation away from previous calibration? Can you pin point when it happened?
Do you only have one sensor rather than 3 or more in the same place? Are you measuring any other parameters that can be related to temperature so that you could estimate from another means?

And on and on. When you have a resolution or stated accuracy to 0.1 degrees, this means it will produce that accuracy under environmental conditions similar to when it was calibrated. Change those conditions and the uncertainty goes up since after all you are trying to use your measurement equipment to be a proxy of the environment around it.

I read John's 2013 submitted paper about uncertainty. In it there's a statement that systematic errors can be reduced by using multiple data sets. That's not exactly true and you have to be careful not to be assuming systematic errors act like random errors that you can say are truly random ( or at least random enough for your purposes)

The other thing is that if I have 10 readings from different thermometers that are placed 10 cms apart, I could reasonably average them depending on how I've characterised their local environment. In other words, minimised variation.

If I have the same thermometers 10 miles apart, how can I characterise the total environment in order to say the measurement error is still 0.1 degrees? Because that is what you say when you talk about averaging trends. If I had 10 sets of my 10 thermometers I could have a greater confidence in the individual measurements. It would be up to me how to consider spatial averages.

BTW an anomaly would have the same or larger error as the original data set as you haven't eliminated the errors of the measurements when you take away the mean. If you present a reduced error you have assumed random errors, CLT etc and most importantly you are saying that it represents the environment in which is was measured to a greater accuracy that you originally measured it. You have to justify this.

This is the problem. People aren't keeping control of their assumptions when creating anomalies because the concept itself when compared to the available datasets means you will have to do some manipulation of noise. Also people seem to be getting lost in the maths rather than realising that each measurement is supposed to be proxy of its environment. When you give an estimate of uncertainty you also give a statement about other variables and how they may contribute to the measurement you make.

Nov 8, 2014 at 9:39 AM |

Micky H Corbett

Rather than address each of your points individually, which would take longer than I have available, may I refer you to the World Meteorological Organisation Technical Regulations .

These address the design of a weather station and its operation, from the basic Stevenson Screen on up. They also considers issues such as quality of equipment and calibration.

Many weather observations are taken by amateurs. (I've taken my own share while manning the tower at my local airfield before it went automatic), but they operate under a worldwide code of practice which addresses most of your concerns.

Nov 8, 2014 at 10:49 AM |

Entropic man

Micky H Corbett

Incidentally, I distinguished between accuracy and resolution. The accuracy of a temperature sensor is a function of its calibration. If it reads 14C when the actual temperature is 15C, that is a problem of accuracy. Expensive thermometers are calibrated to an accuracy of 0.1C, cheap ones to 1C or less.

Resolution is ability to repeat a measurement or detect a change. Thus most thermometers can resolve a temperature change of 0.1C. This has implications for the use of data.

If you compare data from the current dataset with a paleo ensemble of temperatures 25000 years ago, accuracy is the problem. The two figures may be 9.1C+/-1C and14.1C+/-1C. The means differ by 5C but the confidence limits constrain the uncertainty to a difference of minimum 3C and maximum 7C, ie 5C+/-2C

If you compare different years from the modern record the calibration of the system remains constant. Any change in average comes from the difference in temperature readings, which is resolution limited. Thus a change of from 14.1C+-0.1C to 14.6C+/-0.1C would be a change of 0.5C+/-0.2C.

Nov 8, 2014 at 11:14 AM |

Entropic man

Richard Betts - do you have any responses, preferably in your own words, to somebody who does know what statistical significance means?

Nov 8, 2014 at 11:38 AM |

not banned yet

Apart from being pedantic you are missing the key element: when you take a measurement you assume that it represents the environment. Your error how ever you define it is a representation of all the things you don't know. What you are doing is assuming that the measurement alone is satisfactory to represent the process. That's okay, it's a typical mistake made by people who don't do qualification and verification as a job.

But don't let me tell you this: NPL have a nice document about it:

Beginners Guide to Measurement

Nov 8, 2014 at 12:11 PM |

Micky H Corbett

"Thanks Nic, that's a fair point - but I don't think this was the Bishop's point. He was going back to the tired old debate about 'statistical significance' again, which is a red herring! The data show that recent decades were warmer than the late 19th Century, so (as you rightly say) the world has warmed."

I'm guessing Richard is reading the word "significant" in Maslin's claim "We have tracked significant increase in global temperatures of 0.85°C and sea level rise of 20cm over the past century." in the everyday, non-statistical sense of "big"?

"Also, 'significance' doesn't seem to bother you when you talk about the 'pause'. If century-scale warming is not 'significant' then why even bother discussing temperatures over the last decade or two?"

It depends whether you're claiming to *have* a validated statistical model of natural background, or you're trying to *validate* models that have been proposed. The point of the pause is that it is outside the confidence bounds for the climate models, falsifying them. So you can't use the climate models to make reliable predictions.

It's another demonstration that we don't have a validated model we can use to test significance and make reliable predictions. That doesn't mean that the pause *isn't* due to natural variation - a short-term downward excursion from an upward trend. Without a validated model of the statistics of 'natural variation', you can't tell.

Nov 8, 2014 at 12:12 PM |

Nullius in Verba

Do herrings redden with age?

Met Office position 8 November 2012:
//
8 Nov 2012 : Column WA225

Statistical (linear trend) analysis of the HadCRUT4 global near surface temperature dataset compiled by the Met Office and Climatic Research Unit (table 1) shows that the temperature rise since about 1880 is statistically significant.

http://www.publications.parliament.uk/pa/ld201213/ldhansrd/text/121108w0001.htm#12110877000303
//

Nov 8, 2014 at 12:47 PM |

not banned yet

Richard

If we go back to Doug's toy example of coin tosses coming up heads three times, the question he posed was: Should we explain this as (a) a double-headed coin or (b) luck. Note that the question was not whether we could tell if the coin had turned up heads three times or not.

Similarly, when looking at global surface temperatures, it is not a question of whether the temperatures have gone up. The temperature records agree that it has. Instead it is a question of the explanations we can give for this in statistical/mathematical terms.

Nov 8, 2014 at 12:53 PM |

Bishop Hill

And is warmer all that bad anyway, as this summers results show warm UK weather suits us rather well.

http://wattsupwiththat.com/2014/11/07/warmest-year-brings-record-harvests-for-uk/

There have been no heatwaves; on the contrary it has been a very pleasant summer, ranking 15th warmest since 1910.

When the statistics come in later this month, we are likely to see premature winter deaths much lower than last year. Meanwhile, a mild winter has enabled everybody to save on energy costs.

But the biggest news story of the lot has been the fantastic news that agricultural yields and output have hit record highs, thanks to the mild, wet winter, early spring and sunny summer.

Nov 8, 2014 at 1:48 PM |

Breath of Fresh Air

Dear Micky H Corbett

There’s a certain irony in pointing out that EM is pedantic and then pointing him/her towards the NPL documents about measurement which are famously pedantic.

There seems to be some confusion between an error and an uncertainty and what we’re all talking about here. When you ask how EM knows that a particular station reads “low”, I understand it mean you are asking how (s)he knows what the error is i.e. the difference from some “true” or nearly true value. On the other hand, the uncertainty is an estimate of the likely distribution of errors *of that kind*. That’s still a somewhat loose definition; the GUM has the full-on definitions:

http://www.bipm.org/en/publications/guides/gum.html

If we know what the errors are, that’s great, but usually we don’t, so we have to estimate the uncertainty.

In my 2013 paper, I pointed out there’s a difference between errors that are systematic and affect all measurements of a particular type and errors that are systematic and affect a single ship or buoy. There will nearly always be some component of the errors that is random and which can be reduced by averaging. Other components will reduce only if you average across measurements from different ships. Yet others will never reduce because they’re common to all measurements. I think your statement oversimplifies what I presented in my paper. You’re not the first person to make this mistake, so I assume it’s some deficiency in my writing skills that has led to this serial misunderstanding.

I’m not sure I follow your examples regarding averaging and anomalies. I’ve tried to clarify below, my understanding of the situation and how it seems to differ from yours.

Regarding averaging, you focus on getting a better estimate of a point value (10 readings placed 10 cm apart), but this is a different problem from that of estimating global temperature (as you clearly appreciate). For example, say I take those 10 thermometers and measure the temperature at 10 disparate locations. If we label the measured temperature at each point Oi equal to a combination of the true temperature Ti and an error Ei (which could depend on a whole bunch of stuff) then we have

Oi = Ti + Ei

If we average these together we get

(O1 + O1 + … O10)/10. = (T1 + T2 + .. + T10)/10. + (E1 + E2 + … + E10)/10.

Which is a combination of what we want to know – the average of the Ti – and a composite error term – the average of the Ei. It doesn’t actually matter whether the Ti are in the same place or at wildly different locations. Whether you get a more accurate estimate of the average depends entirely on how the Ei behave. If they are random, then you’ll get a better estimate of the average. On the other hand, if the locations are different, you’ll not have a better estimate of the individual Ti.

What if we’re calculating an anomaly? Say, for the sake of argument we have two measurements O1 and O2. We take the average and subtract it from each observation, so the anomaly A for the first observation is:

A1 = O1 – (O1 +O2)/2

Expanding this out:

A1 = T1 + E1 – T1/2 – T2/2 – E1/2 – E2/2 = T1 – (T1+T2)/2 + (E1-E2)/2

How large the error/uncertainty is in A1 depends on how E1 and E2 are related.

Clearly, if the error is constant, then E1 and E2 are the same and the error in the anomaly is exactly zero (this is true no matter how many observations go into the calculation). In the real world, it will be a lot more complex than that, but your statement, that an anomaly would have the same or larger error, is not universally true as I understand it.

If E1 and E2 are random errors and we assume that they standard deviations are the same then the uncertainty in the anomaly is equal to the standard deviation divided by the square root of two. i.e. it’s a bit smaller. As the number of observations used to calculate the anomaly increases, the reduction in the uncertainty (if we assume random errors) will become smaller. For a land station, the climatology will be based on hundreds of measurements, in which case, contrary to what you stated, the “random error” case is one case in which the error in the anomaly is not reduced relative to the original observation.

How real observations behave when averaged and used in later calculations depends a lot on how the Ei behave. If we imagine that Ei are some function of the environment, say weather – air temperature, humidity, wind speed, solar radiation etc. – which they surely are then the degree of correlation between Ei will depend on how closely related weather is at two points. At antipodean points, they are unlikely to be strongly related, nor at distances or times separated by longer than the typical scales of weather systems. The most dangerous cases are those, as you mentioned, where there is a long slow creep over time, or abrupt undocumented changes in calibration. As we analyse more data, these kinds of errors become more prominent.

John

Nov 8, 2014 at 2:12 PM |

John Kennedy

Nic Lewis - please can you highlight the point in your joint paper with Judith where you "make an estimate of future warming?"

I read it when you published and I don't recall you doing as Richard Betts claims, but it is a while ago. I don't have time for a close reading right now and, as you didn't disagree with Richard that you "make an estimate of future warming", I wonder if my memory of the paper is wrong.

If you have made an estimate of future warming I would be interested in the exact wording because my understanding of Judith's position is that there is too much uncertainty in our current understanding of climate to make any claims about the future.

Thank you.

Nov 8, 2014 at 2:35 PM |

not banned yet

John

Your equation is wrong.

It's Oi = Ti +\- Ei

Work that through your calcs and you see errors don't cancel like you say.

Nov 8, 2014 at 2:47 PM |

Micky H Corbett

Secondly listen to your language. You are assuming that there must be an underlying relationship wth errors. As a scientist you have to state that when you present the data. It's up to me to decide whether I can use that approach or I may take the worst case if I have case where I need to.

Nov 8, 2014 at 2:52 PM |

Micky H Corbett

Dear Micky H Corbett,

The true value is T.
The observed value is O.
E, the error, is fixed by the two.

O = T + E

Not plus or minus. I think you are confusing uncertainty and error. It would be plus or minus if I were considering the distribution of errors, but I wasn't.

What do you mean exactly by "underlying relationship wth errors"?

John

Nov 8, 2014 at 3:46 PM |

John Kennedy

"See here by a climate scientist: If somebody asks if something is statistically significant, they probably don’t know what it means."

Doug defines it as: "how likely you are to see something, given something you think probably isn’t true," which is technically correct, but incomplete. It's better defined as how unlikely it would be for you to to see what you saw, assuming the position you're trying to disprove. Or to put it another way, how unlikely the observed outcome would be if you was wrong. Such a definition makes clearer your motivation for asking.
(Strictly, it should be the unlikeliness of the observation or something even further from what would be predicted if you was wrong. But we'll try to keep it simple.)

But the issue here isn't the definition of significance - the issue is that in order to calculate it you need to be able to work out how likely things would be for a given state of affairs - requiring a validated model. While it's true that people often do misunderstand the number, the problem here is that we cannot even get as far as calculating a number to misunderstand. Not validly, anyway.

You also need to know how unlikely the observed outcome is if you're right, to understand if you've got any useful evidence. But that's usually even harder to calculate.

Nov 8, 2014 at 3:48 PM |

Nullius in Verba

John

The basic idea when making any measurement is that you have a true underlying reality and that which you measure. This is how we were taught in first year Physics, it's how NPL consider their metrology. Now yes errors and uncertainty have slightly different meanings but for the general usage you are trying to determine if what you measure is representative of the "true" value, or T.

So in the basic definition your value O is equal to T plus or minus a difference. This is the complete mathematical description:

O = T ± E. Or the absolute value of O is within E of the absolute value of T. You don't know if it is above or below and you cannot assume to know if this relationship will vary over time and that you will notice it varying. The most conservative way to do it is to assume this case.

So now say I have 3 different sets of measurements, O1, O2 and O3. All with associated errors or unknowns which are expressed as systematic errors..

I want to get their mean so I calculate (O1 + O2 + O3) / 3 = O_mean. Now I need to consider the errors as all I have are a set of 3 errors.
Do I add them all up as the worst case (this is often done in the space industry). Do I minimise them and try and reduce my error?

A good assumption and middle ground is to root mean square them. Say the errors are all similar and we end up with O_mean ± E.

I then take an anomaly: This takes one measurement and takes it away from the mean. The error in the simplest case is the RMS. So now we have 1.4 E. We have this because we are taking a derivative of the data and we are trying to be conservative.

I fit a trend line to this anomaly, calculating a gradient. Once more the error goes up if we adopt the RMS approach. So now we have 2E as the error.

Now at all stages of this I have used a simple explanation and assumptions, all of which are quite reasonable. At no stage do we ever see systematic (undefined unknowns) lead to a reduction in error unless we believe that we can combine E together or we learn some more information and realise we have a fixed offset (O = T + E). That often comes by calibration.

I also am assuming a rectangular distribution and I know that often these errors are an educated guess rather than a precise mathematical derivation. I trust them initially and use the above. This will form the basis of any data set I present and as you can see it becomes obvious that the initial deviation from true value, or error or whatever you want to call it, E, needs to be minimised as much as possible if I choose to derive relationships from that data.

If someone can convince me that I'm being to conservative and the situation is better than I thought then by presenting that argument they can do that and it may indeed prove relevant and correct. But in the first instance I don't presume to know.

This approach was reiterated to me by NPL when I was calibrating thrust measurements and has been used by me and others throughout my career. Always start off in a conservative case. Always assume that errors are symmetric about a value but you don't know or can't know how to resolve it. Start from there and try to improve.

In other words, be sceptical.

You might believe that there is a mathematical way around this John but I haven't managed to ever convince any engineer of it. And that's the rub: if temperature anomalies were passed through a peer review with engineers the first thing they would notice are the spread of values and their resolution. They would get a bit uncomfortable with such low errors - as can be evidenced by other people on here.

Now. rather than continue this on I'll digest some of your papers a bit more and might contact you at the Met Office. It is after all just down the road from me. And again I may be missing some bit of mathematical wizardry which may prove useful so I'll keep an eye out for that.

I appreciate that you came on here to chat.

Nov 8, 2014 at 6:51 PM |

Micky H Corbett

@ Richard Betts, Nov 7 at 7:33 PM

Concerning your first link, to a blog post, you previously cited that same post in a comment you made on Mar 27 (at 10:02 PM). I gave a rebuttal that same day (at 10:47 PM). You gave a rejoinder the next day (at 8:43 PM). I then gave a surrejoinder (at 9:14 PM). My surrejoinder points out that you were effectively agreeing with what I have been saying. There were no further comments at that time. If you have something to say further to that discussion, certainly I am interested.

For convenience, here is my rebuttal comment again (the first paragraph is a quote from the blog post that you linked to).

      Statistically significant is sometimes used as a proxy for true, and
      is sometimes muddled with significant or meaningful or large.
      In climate, it also gets confused with caused by human activity.
If we choose our statistical model to represent natural variation—as is usual—and we find some observations that are statistically significant, then how would you interpret that? Put another way, our assumption is that the model represents natural variation and we have observations that lie outside the expected range of the model; so what led to those observations?
I know of only two interpretations for those observations. One interpretation is that something very unusual happened just by chance (perhaps because the significance level was not set conservatively enough). The other interpretation is that there was some non-natural variation—which was presumably be caused by human activity.

That rebuttal was also left as a comment on the blog that you link to.

Concerning your second link, also to a blog post, I left a rebuttal comment on that blog, albeit in a later post. I sent you an e-mail about that, on March 27. For convenience, here is my e-mail again.

Regarding your tweet 448966343078215680
@aDissentient Your 'statistical significance' argument is silly: http://wmbriggs.com/blog/?p=8061
note that I left a rebuttal comment on Matt’s blog, at that time.
I have also now left a related comment on BH:
http://www.bishop-hill.net/blog/2014/3/27/on-consistency.html#item20833099

That last link is to a comment on the post in which we had the above-linked discussion. Here is a quote from the comment: “an event is significant if it is unlikely to be due to random variation in our chosen model, i.e. it is outside the range of what we believe would be reasonably expected to occur under natural variability”.

Additionally, note that significance levels and confidence intervals are just different ways of presenting the same thing: an event is significant at the 5% level iff its occurrence lies outside the 95% confidence interval (in general). So, are you going to campaign against using confidence intervals?

The point made in the comment by Bishop Hill @ 12:53 PM is also valid and clear.

Nov 8, 2014 at 10:34 PM |

Douglas J. Keenan

"The basic idea when making any measurement is that you have a true underlying reality and that which you measure."

Generally, you have a physical process that generates the true values of the underlying reality, the true values themselves, a measurement process that describes the physics of the measurement, and the observed outcome of that measurement. The measurement is generally a random process outputting a value with a distribution related to the true value, this relationship being defined by the physics of both the quantity observed and the measurement process. The idea is to get as much information as possible about the true value, knowing only the measurement outcome and how its distribution is related to the true value.

If you don't know the distribution, the measurement tells you *nothing* about the true value. You observe the value '6':- is that plus or minus 0.1, or 1, or 100, or 1000, or 1000,000, or what?

But the distribution of the error is not the same thing as the error. If the distribution is +/-10 then one time we might get an observed value of 7 when the true value is 5. The true error is +2, which is a single sample from a random variable with a spread +/-10. +2 is the error, +/-10 is the distribution of the error. They're not the same.

But generally, you don't know that the true value is 5. All you know is that you observed 7 and the spread is +/-10, so all you can work out about the true value is that it has some value between -3 and 17. You don't know the actual error - if you did, you'd be able to subtract it and calculate the true value of the quantity being measured exactly, with no error.

Note that there are no constraints on the form of the distribution - in particular, we can't assume that it is symmetrical centred on the true value. It might be biased, anywhere from 3 below to 10 above the true value, for example. A well-designed measurement process will usually have a distribution centred on zero, but not all do. If the distribution is heavily skewed, then the mean, mode, and median are all different, and cannot all be centred on the true value.

"I want to get their mean so I calculate (O1 + O2 + O3) / 3 = O_mean. Now I need to consider the errors as all I have are a set of 3 errors."

You don't want the mean, as such, what you want is the true value of the physical quantity being measured. Under some special circumstances, the mean of several instances gives a more accurate estimate (the distribution of the error in the mean has a narrower spread) than the individual measurements.

"A good assumption and middle ground is to root mean square them."

This is the appropriate action when the error distributions have a statistical property called 'independence'. When the distributions are independent, then the standard deviation (spread) of the mean is the root-mean-square of the standard deviations of the contributors. The spread narrows as you average more values, but the mean of the errors doesn't change. If all the inputs have a non-zero mean, the result of averaging will too. It would be highly surprising if the mean was *exactly* zero, and so there is a fundamental limit to how much extra accuracy averaging can buy.

But in more general cases, assuming independence is unsafe. Just as the mean is rarely exactly zero, neither are measurements ever exactly independent. The appropriate action is now more complicated - the variance of the average is now the average of the elements of the covariance matrix for the joint distribution of all the measurements. Doing the root-mean-square will give the wrong value, and will generally underestimate the error, although mathematically it is just as easy for it to be overestimated (depending on whether the errors are correlated or anti-correlated). In the normal positively-correlated-errors case, the error spread never shrinks below a non-zero residual minimum.

That, incidentally, is why some people take the sum of errors instead of the RMS - if you don't know the errors are independent, then the sum of errors is less likely to underestimate the error. Although I should say that an underestimated error is sometimes as bad as an overestimated one, so one should only do this if the relative costs of mistakes lean heavily on that side.

"I also am assuming a rectangular distribution and I know that often these errors are an educated guess rather than a precise mathematical derivation."

Error distributions are not usually rectangular - the only one that is even approximately rectangular/uniform is the rounding error, and that's a classic case of correlated errors where repeating measurements and averaging doesn't help, since it always gets rounded the same way. People are more likely to assume that errors are Normally distributed, which is certainly far more common, but by no means universal either. Anyone who takes error analysis seriously ought to devote some efforts to *testing* these assumptions: normality, independence, identical distributions. Worse, there are some error distributions that don't even have finite means, or standard deviations - averaging doesn't get you anywhere. If you haven't tested it, you're operating on hope.

Nov 8, 2014 at 11:04 PM |

Nullius in Verba

NiV

Reading your post I realise I should have clarified a few things:

I was talking about when you are trying to achieve say 0.2 degrees resolution but your technique can only get you 1 degree or worse.

Secondly the mean was used because we were talking about anomalies.

But yes everything else you say is part of the process. The fact that I wrote down what I was assuming meant you could ask the things you did.

Nov 9, 2014 at 8:44 AM |

Micky H Corbett

Richard Betts,
Nice quotes, thank you for finding them.
Context is everything, or at least quite a bit.
The first quote does not conflict with my point, if you read its context:
https://dougmcneall.wordpress.com/2014/02/03/a-brief-observation-on-statistical-significance/
"I don’t mean to offend anyone, and I can think of plenty of counter examples*, but this is borne out of long observation of conversations among both scientists and non-scientists. Statistically significant is sometimes used as a proxy for true, and is sometimes muddled with significant or meaningful or large. In climate, it also gets confused with caused by human activity.

Even those that have done lots of statistics can forget that it only tells how likely you are to see something, given something you think probably isn’t true.

It’s one of those horrible, slippery concepts that won’t stay in the brain for any length of time, so you** have to go over it again, and then again, every time to make sure that yes, that’s what it really means, and yes, that’s how it fits in with your problem.

No wonder people don’t know what it means.

*This is just a personal observation, but there is data out there. I’m sure people will provide counter examples in the comments.

** And by you, I really mean I."

And the other quote you provided was from an even larger context, so I will just quote the summary because the author seem to agree with you that,
"Which is the correct model? I don’t know, and neither do you. The only way we can tell is when one of these models begins to make skillful predictions of data that was not used in any way to create the model. And this, no climate model (statistical or physical or some combination) has done."
https://dougmcneall.wordpress.com/2014/02/03/a-brief-observation-on-statistical-significance/

"Statistical significance" is a term that is subject to misuse of course. But to deny it is important is to miss a lot more than the frustration of those who would like to see it used correctly. And to avoid using it when its abuse by those who wish to ram ill conceived, poorly executed, unsuccessful climate obsessed policies on the world is tragic. You yourself say the models are not good for policy making purposes. Why do you tolerate your colleagues telling politicians and private industry leaders the opposite in such silence and instead divert with issues like this?

Nov 9, 2014 at 12:44 PM |

hunter

I've read up on some of the Met Office uncertainty calculations now and will move this to a discussion post. I've also seen how the terminology is used so will continue on that vein in the discussion (confusing uncertainty and errors and that).
There appears to be a fundamental assumption underpinning the error calculations that leaves me a bit uncomfortable and I think that's what my "feeling" for the stated uncertainty was. I'll explain more on the post.

Nov 9, 2014 at 4:10 PM |

Micky H Corbett

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>