Some oddities in HadSST
Reader John McLean emails with details of some surprising finds he has made in the Hadley Centre's sea-surface temperature record, HadSST. John is wondering whether others might like to take a look and confirm what he is seeing. Here's what he has found:
1 - Files HadSST3-nh.dat and HadSST3-sh.dat are the wrong way around.
About 35% down web page https://crudata.uea.ac.uk/cru/data/temperature/ there's a section for HadSST3. Click on the 'NH' label and you go to https://crudata.uea.ac.uk/cru/data/temperature/HadSST3-nh.dat, which has 'nh' in the file name. But based on the complete gridded dataset that data file is for the Southern Hemisphere, not the Northern. The two sets are swapped. The links to named files are correct but the content of those files is wrong, likely due to errors in the program that created these summary files from the SST3 gridded data.
2 - The ASCII file containing observation counts per grid cell has records in the wrong order.
On the above page click on the "HadSST3" link and go to the Hadley Centre page, then from there to the "download page" (http://hadobs.metoffice.com/hadsst3/data/download.html) and you'll see mention of an ASCII file of observation counts for each grid cell.
The data in that file is in the wrong sequence. The HadSST3 gridded data has records for each month in sequence from 90N to 90S but the gridded data runs from 90S to 90N.
(I found this when I discovered lots of SST data with no corresponding observation count and then lots of grid cells with observation counts but no SST data. I've created a crude map of cells that contained SST data in January 2000 and it displays with the NH at the top, so it's not the SST data file that's wrong; it's the counts. When I flipped the data in each month into 90N to 90S order the SST data always had corresponding observation counts and there were no cells with an observation count but no data.)
3 - The ASCII observation count file contains unreadable fields because they overflowed.
Since about 2002 it's not unusual to find cells for which the observation count is '*******', meaning that the count is greater than 9999.00. There's no way for the user to know what value should be in that field.
I suspect that problem comes about because the file was written by a Fortran program because that language fills a field with *'s when the data doesn't fit. To use a real number (i.e. with decimal places) makes no sense because one can't make half an observation or 0.19 of an observation. I don't know why the fields can't be a 7 digit integer, but they're not. (Could it be to cater for the language R ?)
(I'm a bit suspicious about the figures in excess of 9999.00 values because that's an average of over 14 observations per hour, or roughly one in less than 4.25 minutes! Probably 75% of cells with these observation counts are along the western or eastern US coast, but the other 25% aren't. Is it a cluster of Argo buoys??)
He adds:
I see from my notes that in 2002 the instances of instances of '*******' in a field were as follows:
2002: 2,
2003: 1,
2004: 5,
2005:14,
2006: 17,
2007: 103
2008: 143,
2009: 178,
2010: 177,
2011: 111,
2012: 127,
2013: 153,
2014: 147,
2015: 136
(or at least that's how the files downloaded after the January 2016 update had things).
This list might help anyone confirm the existence of these overflowed fields.
The HadSST3 observation count problems won't be used by many people, maybe I'm even the first if no-one else has hit the problems. I found them because I was investigating the temperature and coverage impact in each month of grid cells with few observations
I think a fair question is whether Hadley Centre publishes other flawed data on SST or anything else because it looks like there's no in-house verification that software does what it's supposed to do.
Reader Comments (107)
What's all this mean in English?
In English, they have mixed up the Northern and Southern hemispheres, or in Anglo Saxon they do not know their arse from their elbow. And its twice over as they also think up is down ;)
There was never a drill hole that went from surface to sky in all my mineral exploration years.
What is wrong with the standard of "easy" science these days?
Geoff
Come back Harry. All is forgiven.
In the software industry, mistakes happen frequently and sometimes a minor error can look horrendous to the user(customer), but it's an easy fix.
What matters , is not the error, it's how fast it is fixed and how honest is the explanation.
Did this happen because they were trying to fudge the results and hope no one noticed?
Has anyone thought of just asking John Kennedy about this? Maybe it is a mistake, but maybe not (I've had a quick look at the data files and it's not obvious to me what the problem is)
Upside down data? No problem! Ask Mikey Mann!
Nonsense! This data is all provided by peer-reviewed scientists - so it can't be wrong!
What are 'Reader John McLean's' qualifications? If he has no peer reviewed papers, or is not a practising approved climate scientist, who is he to say that 97% of real scientists are wrong? This is obviously a smear campaign being started by highly-paid Big Oil disinformation specialists... and, in any case, no matter what the data says, it doesn't affect our conclusions....
/sarc - in case anyone needs it... :)
I do hope that Anthony picks up on this. I'd love to read Willis's take on it. The thing is, had the error not been spotted how many government diktats would have been born on the back of it?
Make one kinda proud of a great British institution! Trust me, I'm a Climate Scientist!
This is the body that has repeatedly & consistenty told us through the UNIPCC that the Sun has no significant affect upon Earth's climate, yet recently had the gaul to announce that we could be getting colder NH winters as a result of significantly reduced Solar activity! I do wish they'd make up there minds which it is! No credibility!
Mailman. What's it mean in English?
CRU and co. still can't code.
C'mon, it's just through the looking glass. We already know where we are.
===================
The https://crudata.uea.ac.uk/cru/data/temperature/HadSST3-nh.dat file looks fine to me. Either its already fixed or the author of the post is mistaken. Verified by checking the netcdf gridded fields; NH anomalies are warmer in recent years than SH anomalies, which is correct.
I think that EternalOptimist and ATTP have probably got it right, whether the available download data is the same as used for HadSST is not clear, first step clarify what the situation is. If it is an error then it won't be the first time something like this has escaped.
DOES IT REALLY MATTER IN THE END ANYWAY when you average all the numbers together in the end to formulate the so called Index. It's not a real number anyway, it's all realative and all the parameters are at the discretion of the expert. For afterall, they have all the experience and everyone else is a heretic.
Alan: CRU and co. still can't code.
Thinking how much it takes to be a good software engineer, I'm not slightly surprised by missing output quality checks. If no-one is using those files, it is no surprise they are broken.
GIGO^2
Azimuth - from North or South Pole?
Altitude, Azimuth, and Line of Position Comprising Tables for Working Sight ...Table IV page 155A possible source of such confusion is that sometimes Azimuth has been reckoned from the South Pole in astronomy and satellite observations, instead of from the North Pole as in navigation.
Stanford defines Azimuth:
Azimuth Wikipedia
NOAA has historically inverted longitude and time zone definitions:
oh cmon !
97% of scientists have taken a deep look at this or they would never have come to any conclusions
Actually, since most of the data analysis programs were probably written by CRU they know how the data is stored and just leave it in that format. It probably made the arithmetic easier to program. But anyway, they should at least have info in the header of the file describing the layout for future historians if nothing else and for current users out of courtesy. Most likely the usual bumbling. Good programmers are hard to find.
As far as the infilled fields, a programmer guru I remember said: never write a procedure that has to handle any outside data that doesn't check that everything coming in is in the proper format so it doesn't cause problems, all calculations and results that go outside the procedure have correctly formatted results, all counters, registers, flags, files, and codes are properly set, and any errors that might occur are flagged.
Those kinds of errors in programming have been all over the place and are why there have been so many, many hacker exploits via buffer overlows, range errors, improper program error recoveries, etc.
We know that quality procedures are alien to "climate science".
I note Geoff Sherrington's comment that he never had an exploration drill hole go into the sky. Drill underground and there are plenty of drill holes going up as well as down. +ve dips are as significant as -ve dips.
adulterating data is a warmista's 2nd nature
When the climate scientists say it looks OK to me....... Start to worry.
@John McClean
I just did a quick look at the number of '*******' in the had.SST3.1.1.0. number of observations zip.
They are there, no doubt. But I get different numbers to you.
These were produced by a vb script checking each of the 72 fields by year by month. any '*******' gave a +1
myYear SumOfMissingCount
1855 13
1858 8
1863 19
1869 5
1876 1
1877 13
1878 5
1884 19
1890 15
1891 9
1893 6
1897 10
1909 10
1913 11
1920 7
1925 14
1935 9
1938 8
1939 1
1941 3
1942 6
1945 2
1951 21
1956 18
1968 10
1969 1
1975 11
1980 9
1981 15
1982 1
1985 1
1990 12
1997 12
2002 1
2003 1
2004 15
2005 13
2006 10
2007 91
2008 109
2009 130
2010 148
2011 67
2012 88
2013 94
2014 135
2015 107
2016 20
I can't see any problem with NH and SH. HadSST3-nh.dat (and -sh) is just a file of monthly averages. The numbers in the file correspond to the familiar graphs shown. NH (-nh) temperatures are higher, as expected. Eg the 2015 average for NH was 0.737; for SH was 0.425. The files were last updated 8 March, so I don't think there is a recent change. It looks to me as if John Maclean may have been reading the netCDF gridded file wrongly.
The CRU NH data seems to agree entirely with the Met Office data here.
@ nvw
> ... Drill underground and there are plenty of drill holes going up as well as down
Yes, sure, all mining geologists know that, including Geoff. It's trivially obvious.
But Geoff's point, as I read it, is that such drillholes as you refer to tend to stop once they surface.
The only drillhole I've ever observed that did indeed continue into thin air for a few metres was a supposedly horizontal hole that control survey was lost on and it finished literally drilling into a wombat hole - much to the inhabitant's intense irritation
Nvw,
With respect you made a misleading comment about my drill hole example.
I did specify a collar at the surface, to exclude underground.
Surface is reasonably where land meets sky.
Why did you go to the effort of misrepresenting me?
Geoff
OK. But aside from the fact that the data is totally wrong...
can't drill up where the surface meets the sky....hmmm
https://www.youtube.com/watch?v=qW28i9SkUlg
This article got me curious, so I went and grabbed the 3 ASCII files that were mentioned at the HadSST website ('...number_of_observations.zip', '...median.zip', and '...measurement_and_sampling_uncertainty.zip'). I did not bother (yet) with any of the NetCDF files.
After a very cursory overview of those files (I have no 'R' chops), I came to the following conclusions:
- the meta-data for these files appears to be very minimal (may be more available if I were to read ALL the support docs/papers, but ...)
- the 'median.zip' and 'measurement....zip' files didn't look obviously out of scope
- the 'number_of_observations.zip' file was indeed curious:
- - Yeah, it did look like the file format was legacy FORTRAN-sourced, and nobody has bothered to update it for integer-scale content (one wonders where this stuff is going to get used).
- - The positioning of the overflow markers (field content '*******') seemed to have some significance relative to the particular grid-cells that they represent (beyond their meaning of integer value > 9999.00). At least in the first several years that the overflow-markers occurred, they appeared against the same few grid-cells (I'm still mucking about in the data). Additionally, the neighboring grid-cell values for the same latitude also contained 'large' counts (upwards of 40%-80% of max value). The fact that these overflow values didn't start until late 2002 makes me wonder if the source data for these grid-cells was due either to Argo float transmitters that were improperly nattering as they phoned home, or if there was some particular problems in (what appears to be ICOADS if I followed the HadSST website documentation correctly) source data, such as certain data sets being entered multiple times. I have seen multiple-entry happen before in other data systems, so it wouldn't overly surprise me, but I'm open to other explanations. One of these days, I'm gonna learn enough R to build a grid-cell-count-histogram over a world map to see if there is some way to simplify chasing down these kinds of data problems.
- - Depending on how this file gets used in the HadSST computations (or is it just produced for 'historical reasons'?), I could see the overflow markers (and any other associated data issues) impacting any number of statistical inferences in the SST products. As Steve MacIntyre and Willis E have said on any number of occasions, intimate knowledge of one's detail data can be very useful.
library(raster)
library(ncdf4)
library(maps)
hadsst <- "https://crudata.uea.ac.uk/cru/data/temperature/HadSST.3.1.1.0.median.nc"
download.file(hadsst,destfile="hadsst.nc",mode="wb")
SST <-brick("hadsst.nc",varname="sst")
plot(raster(SST, layer =1000))
map("world",add=TRUE)
data is fine.
How about this.
Some guy says he read the data and it was upside down
I am skeptical of his claim. I want to see his code.
Above you can see how to get the netcdf which looks just fine.
netcdf is typically used in CDRs. Its a standard self documenting format.
@nvw, did you read the comments in the youtube video you linked to? Consensus is that it was a planned drill. Why else would there just happen to be a group of men standing around. What were they waiting for? Godot or the drill to come out of the ground?
Meanwhile at WUWT, Anthony Watts wrote his article (archived here), as if he was certain the data were wrong. Of course Anthony didn't bother checking for himself, he wouldn't know how. As well as his headline of "Friday Funny: more upside down data", he wrote:
I wonder what CRU will have to say about this one that has been discovered? It’s bigger than just a single point on Earth.Anthony wrote his article at least an hour after ATTP's comment, so he should have suspected that it was John McLean who was wrong, not CRU. He probably thought: why let a potential denier meme go to waste?
Source
There will now be a flurry of retracted accusations and associated apologies.
Not.
This 'mix-up' is not as serious as you think, after all they have the 'result they want ' before any data is consider .
And they have shown great skills in 'making the data ' fit their needs.
But frankly given the constantly poor personal standards and awful professional pratice , with a near total lack of good scientific pratice seen within climate 'science' such mistakes are not news at all
Steven Mosher: You referenced <- "https://crudata.uea.ac.uk/cru/data/temperature/HadSST.3.1.1.0.median.nc"
But McLean referenced https://crudata.uea.ac.uk/cru/data/temperature/HadSST3-nh.dat
Are you saying they one and the same?
Phil C
You (quite rightly) expect to see a retraction when something is wrong.
Can you tell me when James Hansen will be making retractions of his hopelessly wrong predictions such as "the west side highway underwater" and his laughable ABC scenarios?
When will UNEP be making an official retraction of their ridiculous "50 million climate refugees by 2010" statement?
Will 'Climate Czar' John Holdren be retracting his 1985 prediction that there will be 1 billion climate-related deaths by the end of this decade?
I could quote loads more busted alarmist predictions, but quite frankly I can't be bothered as there are almost too many to mention.
Mind you, warmists such as Stephen Schneider seem to think it's okay to exaggerate when you're fighting for the cause:
Because he didn't make any such prediction. He was asked what would happen in 40 years if CO2 doubled. Good to see you promoting standard science denier memes though.
As far as Stephen Schneider's quote is concerned, you should read this. However, don't let me stop you from promoting these standard science denial memes. I'd hate to take them away from you; you might have nothing left.
Heh, the Mad, the Bad, and now, the Ugly.
==============
I do not think John McCain owes anyone any apologies.
The number of observations file interests me, for two reasons. First, it is not adjusted, what you see is what you get. Second, it is possible to look at the grid on the map, to see where the observations were made.
John focussed on the ******* fields. The cell with most of these is 13,25 or 32.5 S, 117.5 W.
This cell is smack bang in the middle of the pacific close to Easter Island. it has had 137 *******
plus 1324 observations going back to 1850 cf the English channel which has had 376
Ken,
Jim and his CO2 levels. Yep, Jim was wrong - it's nearly 30 years later and CO2 levels are nowhere near to doubled. Also, current sea level rise is refusing to play nicely as it's still extremely unalarming despite rising CO2 levels. 3mm/year (at most) - pathetically small. I think that West Side Highway will be just fine.
I've already seen the page you link to about Stephen Schnieder and the full quote. As commenters at the bottom of that page say, the full quote doesn't change Schneider's unethical stance one bit.
Nice try Ken, keep playing.
Ken, you are Wirthless.
=======
clarification !!!
32.5 S, 117.5 W
This cell is smack bang in the middle of the pacific close to Easter Island. it has had 137 months containing *******
plus 1324 months containing observations going back to 1850 cf the English channel which has had 376 months containing observations
I need to double check this , because it seems to be barmy
EternalOptimist: Just for the hell of it, if NH and SH had been switched then 32.5N, 117.5W is just off the coast of San Diego. If the Westing was swapped to an Easting as well, it's in the heart of China. (According to Google Maps)
@Harry, I'll check out the old reverso later
in the meantime - how about this
row 36 - the Antarctic
1124 months with obs, 71,008 obs in total.
Ooh, yes, aTTP: a wonderfully balanced point of view:
While “climatesight” did correct her (him?) somewhat, he (she?) later writes: Does climatesight not like the concept of scepticism in science? If so, how do you consider her (him?) as a serious defender of science?Charles presents the best comment:
A policy we should all try and adhere to. But which would he choose, if there is a conflict? The evidence suggests the “being effective” choice.David,
Given that it is patently obvious that responding to a question about what would happen if CO2 doubled in 40 years is not the same as predicting that it will, I have no suitable response to your comment.
Is Gavin honest? Time tells, will tell, and has told. As for Ken Rice's moving hand, well!
=============