Reader John McLean emails with details of some surprising finds he has made in the Hadley Centre's sea-surface temperature record, HadSST. John is wondering whether others might like to take a look and confirm what he is seeing. Here's what he has found:
1 - Files HadSST3-nh.dat and HadSST3-sh.dat are the wrong way around.
About 35% down web page https://crudata.uea.ac.uk/cru/data/temperature/ there's a section for HadSST3. Click on the 'NH' label and you go to https://crudata.uea.ac.uk/cru/data/temperature/HadSST3-nh.dat, which has 'nh' in the file name. But based on the complete gridded dataset that data file is for the Southern Hemisphere, not the Northern. The two sets are swapped. The links to named files are correct but the content of those files is wrong, likely due to errors in the program that created these summary files from the SST3 gridded data.
2 - The ASCII file containing observation counts per grid cell has records in the wrong order.
On the above page click on the "HadSST3" link and go to the Hadley Centre page, then from there to the "download page" (http://hadobs.metoffice.com/hadsst3/data/download.html) and you'll see mention of an ASCII file of observation counts for each grid cell.
The data in that file is in the wrong sequence. The HadSST3 gridded data has records for each month in sequence from 90N to 90S but the gridded data runs from 90S to 90N.
(I found this when I discovered lots of SST data with no corresponding observation count and then lots of grid cells with observation counts but no SST data. I've created a crude map of cells that contained SST data in January 2000 and it displays with the NH at the top, so it's not the SST data file that's wrong; it's the counts. When I flipped the data in each month into 90N to 90S order the SST data always had corresponding observation counts and there were no cells with an observation count but no data.)
3 - The ASCII observation count file contains unreadable fields because they overflowed.
Since about 2002 it's not unusual to find cells for which the observation count is '*******', meaning that the count is greater than 9999.00. There's no way for the user to know what value should be in that field.
I suspect that problem comes about because the file was written by a Fortran program because that language fills a field with *'s when the data doesn't fit. To use a real number (i.e. with decimal places) makes no sense because one can't make half an observation or 0.19 of an observation. I don't know why the fields can't be a 7 digit integer, but they're not. (Could it be to cater for the language R ?)
(I'm a bit suspicious about the figures in excess of 9999.00 values because that's an average of over 14 observations per hour, or roughly one in less than 4.25 minutes! Probably 75% of cells with these observation counts are along the western or eastern US coast, but the other 25% aren't. Is it a cluster of Argo buoys??)
He adds:
I see from my notes that in 2002 the instances of instances of '*******' in a field were as follows:
2002: 2,
2003: 1,
2004: 5,
2005:14,
2006: 17,
2007: 103
2008: 143,
2009: 178,
2010: 177,
2011: 111,
2012: 127,
2013: 153,
2014: 147,
2015: 136
(or at least that's how the files downloaded after the January 2016 update had things).
This list might help anyone confirm the existence of these overflowed fields.
The HadSST3 observation count problems won't be used by many people, maybe I'm even the first if no-one else has hit the problems. I found them because I was investigating the temperature and coverage impact in each month of grid cells with few observations
I think a fair question is whether Hadley Centre publishes other flawed data on SST or anything else because it looks like there's no in-house verification that software does what it's supposed to do.