Click images for more details



Recent comments
Recent posts
Currently discussing

A few sites I've stumbled across recently....

Powered by Squarespace

Discussion > Temperature anomaly measurement uncertainties

On a previous post, the Maslin one, I expressed some doubt as to the small uncertainty often expressed for temperature anomalies. This came from, I guess, a "feeling" that the analysis was a bit too Sunny Day. It wasn't taking into account the difficulties associated with drift and lack of calibration/ongoing characterisation of measuement processes.

Now I wasn't too clear in how I described this, mixing terminolgy. So I went hunting for some Met Office papers to help clarify my position and doubt. And sure enough I found one that best explains what I'm talking about. Now I don't presume to be right about this, it's just that I feel that something major has been missed, though it may appear subtle.

In the paper Reassessing biases and other uncertainties in sea surface temperature observations measured in situ since 1850, part 1: measurement and sampling uncertainties by John Kennedy et al, in section 3 there is an error model. I will reproduce what I can here (without symbology) but you can read it at the link:


Consider an SST measurement O(i,j) taken by ship i at point j that has been converted to an anomaly from a reference

climatology. This observation is a combination of the true SST anomaly at that point, T(i,j), a random measurement error, M(i,j),

that is different for every observation and a constant offset - the micro-bias error, for measurements from that ship, B(i). Here, "ship" is used to refer to any single entity - be it ship, drfiting buoy, or moored bouy - that takes SST measurements.
O(i,j) can thus be written as:

O(i,j) = T(i,j) + M(i,j) + B(i)

M(i,j) has mean zero and standard deviation s(m(i)) (should be sigma) and is different at each point j. B(i) is drawn for each ship, i, from a sample with mean zero and standard deviation s (b(i)). A mean of zero assumes that the bias adjustments discussed in Part 2 of this paper have adjusted successfully for the mean bias for each measurement type.


It assumes that the Ms and Bs are independent of the Ts and the Ms are independent of the Bs

Equation (11) shows the error (s_error^2) as:

s_error^2 = s(m)^2/n + s(b)^2/m + s(s)^2 ( 1- r_bar) / n

s(s) is the variation in the anomalies.


So as you can see, the idea is that anomaly, measurement error and bias variations decrease with more data. The bias itself is like what Entropic Man described on a previous post: an accuracy offset that doesn't contribute to the overall trend.
This appears to be the case and I don't have a problem with this idea.

My problem is that bias drift is not taken into account. Bias drift is normally removed or minimised by regular calibration and characterisation of the measurement process. The intention being that you can reduce bias variation down so that you can use the bias (offset) as above.

But with SST measurements and especially large sets of varied data using the bucket approach, that's going to be hard. It also may be fruitless. I'd like to see this assumption stated clearly when SST measurements are presented and let other people decide if it is valid. I believe Steve Mc also talked about this some years back.

I know from my own experience that calibrated devices can often drift quite severely and suddenly within days of calibration, something that is often caught because it affects some other parameter you are trying to measure. Often the solution is to increase the uncertainty in the overall measurement process. Also sometimes the problem is in the process - so multiple measurements are affected

So this is the reason why I expressed some doubt about that temperature anomaly accuracy and suggested that without such meta data or knowledge of the measurement process, it may be better to give a cruder temperature uncertainty. There's nothing sinister and I'm not trying to be stubborn for the sake of it - there's just something a bit too simplistic about how the uncertainty is reached.

If other experimental data is out there then I'd be happy to change this view.

Nov 9, 2014 at 6:59 PM | Registered CommenterMicky H Corbett

Dear Micky H Corbett

I don't think it's generally a good idea to give people reading lists, but if you are looking at the uncertainty analysis, and if you haven't already...

You probably ought to read part 2 of the paper, which looks at uncertainties associated with the biases and systematic drifts and so on. There's a copy here:

Part 1 is a prelude to that. We need to get a reasonable handle on the "easy" bit of the uncertainty to help understand the less tractable bits. If you're reading part 2, then you should probably look at the review on measurement bias written by Kent et al. which was in Wiley Interdisciplinary Reviews (googling usually throws up a free copy, but for some reason I can't find it at the moment) and my review paper on uncertainties in SST more generally. Copy here:


Nov 9, 2014 at 7:40 PM | Unregistered CommenterJohn Kennedy

No John, thanks for that. I'll take some time and look at it.

Nov 9, 2014 at 7:51 PM | Registered CommenterMicky H Corbett

Sorry, I'm back and fore with the old thread and this new one. I want to clarify a few things

First, the aim is not to make a data set that achieves a desired uncertainty (e.g. 0.2K). The aim is to quantify what the uncertainties of the data and quantities derived from the data are. The estimated uncertainties should be realistic. That is they should be neither optimistic nor pessimistic. There's an obvious danger in underestimating uncertainty, but there's also a danger in overestimating them because it could lead you to disregard data that are actually good enough for a particular purpose.

Second, how accurate can a temperature measurement get? When you talk about plasma and measuring temperature you talked about the difficulty of getting reliable measurements. There are systems which measure SST such as Argo, the claim accuracies better than a tenth of a Kelvin and the best space-based instruments get close to 0.1K with drifts measured in mK/year. Going further back in time, if you read papers discussing the design of dedicated meteorological buckets, it's clear they hoped to get with tenths of a K of the true SST.

Nov 9, 2014 at 7:58 PM | Unregistered CommenterJohn Kennedy

Hi John

I agree entirely. I made a bit of a pigs ear explaining the simplified accuracy approach on the other thread. Nullius in Verba was quick to point out problems and rightly so. What I should have done is be a lot more specific. I've had to work with NPL to measure physical thrust on a thrust balance and compare this to "electrical thrust" which is what comes from the circuit measurements. You use this to correct the electrical thrust into a predicted "real" thrust which is used by the control.
After considering all the uncertainties in the components of the measurement equipment including drifts, and since we can control the plasma very well so as not to introduce volatility when measuring, you come to a value of uncertainty for the measurement. The process considers independence of errors and things like that. I'd forgotten about that before.

Then the issue becomes a sampling issue. If you can only do 3 sets of measurements ( due to time and cost) is it worth trying to minimise the variation further. As a pessimistic approach you often continue to use the calculated accuracy as it adds margin. Not so good if you need to reduce uncertainty as you can imagine.

I guess what I'm looking for and I'm sure others have studied it, is where a process or measurement has been given an accuracy value by some sort of calibration. And then a user will round the measured value to it. Or there is an engineering tolerance on say an inlet temperature sensor because it isn't completely thermally isolated from its surroundings.

I will say this though: having looked at some of the analysis I don't think that anyone is intentionally trying to produce a small uncertainty value. So at a glance the quoted uncertainty in the temperature record does appear to be a best estimate. I just have that sceptical itch that needs scratched but I'll do it by looking over the papers.

Nov 10, 2014 at 6:57 AM | Registered CommenterMicky H Corbett

Micky H Corbett

Keep on scratching! :-)

You might be interestedd to know of another bias that went unnoticed for a long time. During the early part of WW2, one of the largest generators of sea temperature data was the Royal Navy using the bucket method.

After 1941 the US Navy became a big contributor using temperatures taken from the engine room cooling water intake.

While the bucket method tended to under read, the intake readings tended to be too high. If this is not accounted for, you might think that WW2 increased sea surface temperatures. :-)

Nov 10, 2014 at 10:20 AM | Unregistered CommenterEntropic man

So after reading (and continuing to read back and search back) the trouble with trying to estimate bias drift in buckets and engine intake is quite a problem.

One thing that I noticed was that there is an assumption that bucket measurements may have a cold or warm bias once on deck (Parker, 1985 and Kent 2006 are some examples). I sailed across the Atlantic in 2005 and had quite a few sea water "showers" - hauling water in from a plastic bucket. It was 32 degrees on deck but I wonder just how fast 5 litres of water could heat up especially if a thermometer was dipped and held in the middle of the bucket.

Has anyone done any experiments for the bucket method? I see there was a study for air temperature measurements onboard ships as well as tests for IR sensors for satellites so I'm wondering has this been done.

Nov 13, 2014 at 6:22 PM | Registered CommenterMicky H Corbett

If air temperature iis higher than water temperature you get aan overread. if water temperature is higher than air temperature you get an under read.

If the water and air temperatures are the same, then evaporation gives you an under read.

Nov 13, 2014 at 8:43 PM | Unregistered CommenterEntropic man

Dear Micky H Corbett,

There are a bunch of old papers here:

There have been some studies of heat loss from buckets in wind tunnels and so on. Also check out Folland and Parker 1995, if you haven't already:

Some of the specially designed buckets hold quite small water samples.


Nov 13, 2014 at 11:01 PM | Unregistered CommenterJohn Kennedy

For Micky - some of these might be of interest:

Nov 14, 2014 at 12:12 AM | Unregistered Commenternot banned yet


Thanks. It really is an attempt to measure "the surface". It would make sense then that air temp affects the measurement.

John, I'll continue to search. Cheers.

Nby, thanks for the link.

Nov 14, 2014 at 7:10 AM | Registered CommenterMicky H Corbett

I'm bumping this thread because with a bit of digging I'm understanding how the bucket bias correction has come about. Scientifically the theory-model-experiment of how uninsulated canvas buckets can cool dependent on local conditions (wind speed and most importantly relative humidity) seems to be established as a good first estimate. However, the problem is that the global temperature anomaly record should require a higher standard of characterisation and I'll briefly explain why.

The relevant refs are:
Folland 1991
Folland 1995
Met office HADSST3 refs

The 1991 paper is the actual model and confirmation of the basic idea by performing tests on ships. Have a read when you can. The Met office refs include 1938 Admiralty handbooks which recommend some sort of thermalisation process before taking a measurement.

So here's where I think from a scientific and engineering this bias correction (and possible bias) shouldn't be quoted the way it is:

1) The original set of experiments don't seem to have any measurement control or calibration. One of the sensors was ignored and a thermistor was used. Thermistors have more uncertainty than thermocouples or thermometers, but in any case you could use a thermometer to recalibrate the thermistor. If this was done it's not obvious.

2) Only one type of bucket is used - it would be better to use different bucket types to see the spread of data and match to experiment

3) There are no follow up or accompanying recent tests in controlled environments i.e. extensive characterisation. Hopefully this is wrong and the Met Office has been doing ongoing tests. The reason why I'm concerned is that the Folland model-to-data set is 23 years old. The correction is 23 years old and doesn't seem to have been updated. There's nothing wrong with the data in terms of demonstrating that this could be a significant effect but ethically if it is applied to a data set used for policy it requires a much higher engineering standard applied.

4) Thermalisation - for those ships that may have prethermalised thermometers according to handbooks - i.e. take a sample, dip the thermometer in, wait, then discard and get a new sample for the actual measurement - the cooling effect may be exacerbated. If the initial sample cools and then the thermometer cools while being in the air (smaller thermal mass) the difference between thermometer and new sample may be larger and produce a larger bias correction. Of course I don't know this but characterisation of the "measurement process" is needed to catch more complicated measurement conditions but that reflect the actual process more closely.This directly feeds into the bias uncertainty which may not have a Gaussian distribution.

Also the estimated corrections dependent on location can be tested in a lab and then confirmed on ships. After all the dominant effect is relative humidity which can be controlled pretty easily in most clean room set-ups. A whole look up table of corrections could and should be created.

So in summary, what seems to have happened is that a pretty decent scientific result has been applied to an engineering data set but without the subsequent experimental characterisation and validation that is needed. Again hopefully I'm missing a whole other raft of papers and data that says it has been done.

Do I think it's the Met Office's fault if this hasn't been done? No, maybe not. But if you read through the papers and other references and then think about the billions spent on climate change using this data set, you'd probably want some independent confirmation and validation of the effect to be done so maybe a consideration of the engineering side of things wouldn't go amiss.

I'd recommend the Met Office spend some of that £97 million on testing. Hopefully they have already.

The bias correction and temperature anomaly uncertainties are a nightmare as there is little metadata as John comments about in his papers. A bit more actual testing would make a world of difference and perhaps bring sceptics and non-sceptics together a lot more because I don't think there is a lof of trust in the temperature anomalies what with the other adjustments that go on.

Just my two cents. Happy to be shown otherwise.

Dec 15, 2014 at 8:31 PM | Registered CommenterMicky H Corbett

Micky H Corbett: my own understanding of sea surface temperature measurement was that it was properly done with a specially designed rubber bucket, its thick shell reducing loss/gain in temperature of the contents until measurement. Quite how the “anomalies” could be calculated has to be suspect; the potentials for error are so wide and varied, from not leaving the bucket long enough in the water to balance its temperature, to the length of the haul to the deck (usually the bridge, which could be 40 metres up), to delays in actually reading, incorrect reading, and potential of improperly-kept thermometers. I would argue that there are probably too many variables involved on each ship, and the logistics of investigating each and every situation would be prohibitive for it to be described as “scientific”. As for engine intake temperatures – as the engine intake could be anywhere between 2 metres below the surface to 20 metres below; neither being “surface”; also, the measurement would be electronic, with no imperative for it being as accurate as is really required (hey, what’s 0.5 or 1°C when compared with the temperatures in the engine?). So, how do they determine the anomaly in this case? Perhaps the only solution would be to accept each and every reading as read.

Dec 15, 2014 at 10:39 PM | Registered CommenterRadical Rodent


You should go and read some of the Admiralty stuff. It's very interesting and clear that they do intend to get a decent measurement. Commercial ships I don't know but Navy vessels probably took a pretty strict view on it worldwide.
However my point is that some degree of characterisation can be done even if in the end it widens the uncertainty. We are talking about hauling buckets of seawater maybe over some distance - it doesn't require that specialised equipment. It would also be worth testing processes such as the Admiralty recommendations. Yes in the end it may not be as fruitful as hoped but this data set has that big an influence on policy that it should be tried.

Dec 16, 2014 at 6:36 AM | Registered CommenterMicky H Corbett

Commercial vessels appointed as met. ships will follow similar procedures to those of the naval vessels, but they do have other points that naval ships might not have, less man-power being one, larger freeboard being another. Another uncountable anomaly is that – certainly for commercial vessels – the measurement is taken with the ship moving through the water, the bucket thus trailing in the disturbed water alongside or behind the ship. As I said, it is probable that these anomalies will never be quantifiable, and it might be better to take the readings as read. The result IS data, and might be able display trends, if not values.

Dec 16, 2014 at 4:24 PM | Registered CommenterRadical Rodent


I hear you and I do agree. My original stance was that maybe it wasn't possible to resolve the uncertainty down to the values stated.

But the Met Office did propose a model. They did analyse the data and make a stab at an adjustment. And the adjustment (temperature anomalies went up) has sat there for about 20 years. It has had direct implications on AGW and associated policy.

Taking the situation as it is, and having read through some of the papers, I think the uncertainty is still a work in progress and that the adjustment model needs to be characterised to a much higher standard if this data set is to be applied to policy. Because of this I'm not sure the uncertainty can be justified on one or two experiments then models after that.

It needs to be qualified in the engineering sense - and subject to some controlled chaos to boot to get a good feeling for most if not all operational conditions. Much like any hardware or process does before use.

It's like someone made a plane wing, did one test on it then said it was ready for flight. You'd probably want a bit more rigorous testing to be done and a good drop of realism applied to the results.

And like you say maybe after doing this a limit to how much the uncertainty and bias drift could be minimised would be better understood. As I said above I hope I'm wrong that this is the only real data set showing some sort of cooling effect. That's a very sparse data set.

Dec 16, 2014 at 8:31 PM | Registered CommenterMicky H Corbett

One problem now is that it would appear that fewer and fewer commercial vessels are being used for met. observations, presumably because of the greater number of ARGO buoys, as well as satellite measurements, so the data will be getting more and more sparse. On the basis of what I said before, it would be logical for the measurements to be adjusted upwards (the surface being warmer than the deeper water that tends to get mixed with it – or even those from the engine intakes). Quite how these measurements will be adjusted and “homogenised” is beyond me.

Dec 16, 2014 at 8:49 PM | Registered CommenterRadical Rodent

I might have posted on this before. Sorry. I have experience of throwing a bucket overboard a yacht with a metre freeboard in a flat sea. It was hard work and any mistake meant I would have been overboard. Translate that to commercial shipping. 3 metres freeboard plus, travelling at 7 knots, no way I would do that. And doing that in gale force conditions. Don't make me laugh. All the measures pre intake pipe are highly uncertain..... Or crap, whichever you prefer. Either way, don't rely on the records

Dec 16, 2014 at 10:26 PM | Unregistered Commenterdiogenes

The “official” bucket is smaller than the standard bucket, being basically a closed-end 3-inch tube of thick rubber, holding about a litre. Ship’s rails tend to be more robust than those of yachts, so the seamen can brace themselves against that, should they need to; also, the deck will probably be more stable, too, and getting dry after a drenching would not be quite the same problem as it would be on a yacht, either. It would not be too difficult to obtain the required sample, no matter what the freeboard, or the speed (mind you, at 25 knots, the bucket may well be airborne more often than submerged).

As you say, the engine intake pipes, which seem to be the more commonly-accepted reading of the “sea surface” temperatures nowadays, are suspect, to say the least.

Dec 16, 2014 at 11:07 PM | Registered CommenterRadical Rodent