Autocorrelation in the Sahel
This is a guest post by Doug Keenan.
In August, the journal Nature Climate Change published a piece by a researcher at the Earth Institute of Columbia University. The researcher, Alessandra Giannini, is an expert on precipitation in the Sahel, and her piece was on that topic.
Giannini’s piece notes that Sahel precipitation has been slightly increasing during the past few decades, but then warns as follows.
… a gap in research: a complete understanding of the influence of [greenhouse gases], direct and indirect, on the climate of the Sahel. This is needed more urgently…. While precipitation may have recovered in the seasonal total amount, it has done so through more intense, but less frequent precipitation events. This state of affairs requires more attention be paid to the climate of the Sahel, to ensure that negotiations around adaptation, such as those taking place in the run-up to the Conference of the Parties of the UN Framework Convention on Climate Change that will be held in Paris at the end of this year, are based on the best science available….
Even though precipitation is increasing, and even though intense rainfall is often beneficial (because it penetrates deep into the soil), the piece has a strong alarmist tone. As that might suggest, Giannini is an environmental activist: indeed, she has prominently participated in demonstrations for “climate change action”.
Giannini’s piece was the subject of a post at Bishop Hill: “Duelling models”. The post noted some apparent problems with Giannini’s piece, in particular with the statistical analyses. After seeing the post, I wrote to Giannini, to discuss the statistical analyses. Giannini and I then had a brief e-mail exchange. A copy of the exchange is below.
_____________________
My first message to Giannini was the following.
In your paper “Hydrology: Climate change comes to the Sahel” (just published in Nature), Figure 1 displays Sahel precipitation anomalies for 1979–2008. The figure’s caption gives some correlations and the significance levels of those correlations. How were the correlations calculated? In particular, what orders of autocorrelation were used?
Giannini kindly replied the same day, as follows.
The correlations are plain, vanilla, Pearson correlations -- something like this:
<http://glossary.ametsoc.org/wiki/Correlation>
No estimation of autocorrelation was taken into account.
Just to clarify, the time series are anomalies of the July-September average rainfall computed with respect to the 1979-2008 climatology.
July to September is the core of the rainy season in the Sahel. Rainy seasons are separated by a prolonged dry season during which it rarely if ever rains.
Here is an example of a daily precipitation time series, from Niamey, Niger - the last 365 days on record:
<http://www.cpc.ncep.noaa.gov/products/global_monitoring/precipitation/sn61052_1yr.gif>
The observed values I used in the calculation can be downloaded from here:
The modeled values can be obtained through the CMIP5 archive:
or I should say, you will be able to obtain them once the data portal is back online…
I responded as follows.
Your message is much appreciated. The correlations that you calculated seem fine. The issue is with the significance levels (or confidence intervals).
From the observed values, it is clear that if a given year is especially wet (or dry), then that increases the chance that the next year is above-averagely wet (or dry). In other words, the observed series is autocorrelated. The calculations of significance that you did assumed that autocorrelation was 0. Thus, the calculations are inaccurate.
For some background on this, see
http://www.ltrr.arizona.edu/~dmeko/notes_9.pdf
I got the observed values easily, from the link that you kindly included. Would you be willing to send me the 30 multi-model ensemble mean values?
Relatedly, I was curious—what is the reason that the data stops in 2008?
Giannini kindly replied the next day.
I understand. Usually we worry about autocorrelation when a time series exhibits persistence, which would mean that this year's above average precipitation increases the chances that next year's will also be above average, whereas you are pointing to a negative auto-correlation.
Statistically, the correlation with 1-year lag as I computed it comes to 0.24 in the observed time series I shared, which by the same token used in the article is not significant with the same level of confidence.
Maybe we need to compare notes…
In the past, the time series of Sahel rainfall has shown more persistence year-to-year, which had to do with persistence in the oceanic forcing of Sahel rainfall - decades of persistently above average followed by decades of persistently below average [drought] conditions. But that seems to be no more, hence what you correctly observed as swings year-to-year.
The next thing to do would be to come up with a physical hypothesis as to why the swings…
Extracting the multi-model ensemble-mean time series requires a lot more work. It is non-trivial, often under appreciated, grunt work for any climate scientist who wishes to analyze the CMIP/IPCC simulations, hence my reason for pointing you to the CMIP5 archive…
The analysis stops in 2008 because the specific model simulations used were run over 1979-2008.
My response was the following.
I am happy to receive your comments. ….
For the observed time series, the lag-1 autocorrelation is not known. Suppose we assume that the data is generated by a first-order autoregressive Gaussian process (this assumption is actually an unjustifiable simplification). Then the autocorrelation coefficient has a maximum likelihood estimate of 0.24 (the value cited in your message), a 68% confidence interval of [0.04, 0.41], and a 95% confidence interval of [-0.14, 0.55]. The confidence intervals are fairly wide mainly because the time series is short.
Given the above, when considering how much autocorrelation could plausibly affect the conclusions, a reasonable conservative value is the upper limit of the 68% confidence interval: 0.41. That value would seem to be large enough to affect the conclusions, depending on the autocorrelation of the modelled series.
About the underlying physical mechanisms, I am largely ignorant. I was only considering the statistical analysis. My suspicion is that statistical analysis will not be very useful here, though, and physical simulations are required.
About the 30 multi-model ensemble mean values, I realize that obtaining them involves work. The web site for the data seems to be down for the next month, though; also, I assume that you have the values, as you did calculations with them(!). Would you be willing to send them?
Giannini did not reply to that.
Ten days after sending the preceding message, I sent the one below.
I did not receive a reply to my last message; you should have at least sent me the 30 data values. In any case, regarding the claim in your piece that the “multi-model ensemble mean correlation with observations is … significant at the 5% level”, it seems clear that the claim is false—once the autocorrelations are considered.
Your piece additionally claims that the “correlation of the ensemble mean of 5 simulations with GFDL-CM3 … is 0.58, significant at the 1% level”. The data for GFDL-CM3 is also unavailable to me. Your piece, however, includes a figure that displays the data: I obtained fairly-precise data values via reading off the figure.
The maximum likelihood estimate for the correlation that I got was 0.58, i.e. the same as you got. That correlation is so high that it is significant even after the autocorrelations are considered. The correlation, however, is still not actually significant—due to a second error in your piece’s calculations.
To understand the second error, suppose that we compared the observed time series with, say, 50 other random time series. Then the observed series might well significantly correlate with a few of the random series, just by chance. Generally, whenever we compare a single series to multiple series, we need to consider that some of the comparisons might appear to be significant just due to chance. For an amusing explanation of the issue, not specific to time series, see the “Significant” cartoon at xkcd:
http://xkcd.com/882/The multiple-comparisons issue is discussed in the time-series course notes that I linked to earlier:
http://www.ltrr.arizona.edu/~dmeko/notes_9.pdfThe notes recommend that, when doing multiple comparisons, the significance levels be obtained via the “Bonferroni adjustment”. There is also a more general discussion of the issue in Wikipedia:
https://en.wikipedia.org/wiki/Multiple_comparisons_problem—which lists the Bonferroni adjustment as one of several methods to address the issue.
When you compared the observed series to the different modeled series, the multiple-comparisons issue arises. There were 62 different models used, if I have understood the CMIP5 documentation correctly:
http://cmip-pcmdi.llnl.gov/cmip5/docs/CMIP5_modeling_groups.pdfSome adjustment needs to be made for that. Whatever method is used to make the adjustment, it seems clear that after both making the adjustment and considering the autocorrelations, the claim—about the correlation between observations and GFDL-CM3 being significant—is invalid.
Your piece further claims that GFDL-CM3 is a better model than HadGEM2 for Sahel precipitation. That claim is very dubious for the same reason: the higher correlation with GFDL-CM3 might be just due to essentially random noise.
I did not receive a reply to that.
There is one issue that merits elaboration, perhaps. Suppose that we have 30 measurements from some time series, a1, a2, a3, …, a30, and 30 measurements from some other time series, b1, b2, b3, …, b30. Suppose further that we want to know the correlation of the two series. There are computer programs available that, given the two 30-element inputs, will output what is called the “correlation” of the two series. The correlation that is output by such programs is sometimes misinterpreted.
Each of the two series is usually part of a potentially much longer series: a1, a2, a3, …, an, and b1, b2, b3, …, bn, for some large n. In other words, each of the 30-element inputs is actually just a sample from a much longer series. What we are really interested in, in general, is the correlation of the longer series.
There is no way to know, for certain, the correlation of the longer series. What we can do, however, is estimate the correlation of the longer series, from the 30-element samples. Typically, when making the estimate, we should determine what the most likely value for the correlation is, as well as, say, the 95%-confidence interval and the 68%-confidence interval (or similar likelihood intervals). What some computer programs output is just the most likely value for the correlation of the longer series—and this is sometimes misinterpreted as the actual correlation of the longer series, when it is really only an estimate.
Reference
Giannini A. (2015), “Climate change comes to the Sahel”, Nature Climate Change, 5: 720–721; doi:10.1038/nclimate2739.
Reader Comments (17)
Well, yes. Responding to hard questions and requests for data is grunt work, too. And doesn't help with the next grant application. Academics really do like to publish, and move on. Especially in climate.
Neither does this "piece" appear to take into account the increased water use efficiency of plants at higher atmospheric [CO2].
E.g.New Phytologist (2013) 197: 544–554 "Elevated CO2 increases tree-level intrinsic water use efficiency:insights from carbon and oxygen isotope analyses in tree rings across three forest FACE sites" http://www.wsl.ch/info/mitarbeitende/cherubin/download/2013_Battipaglia_et_al_nph.pdf
In other words they are trying to spin a "good news" story into alarmist gloom and doom.
I would only bother hurting my brain with correlation coeffs and significance measures if the mark 1 eyeball decides that there is something going on. Just looking at the graph in the paper fails my eyeball test (which tends to be very unimpressed by linear trends over tiny numbers of years), so any stats derived are quite probably describing random noise.
So how she paid?
If shes paid per warmish alarmish report , thats what we'll get.
If she's paid per doing science and engaging with research , the only soul in the world that ever approached her with anything susbstantial regarding what she is doing, scientifically is certainly going to get an answer.
oh , columbia university!
She's paid per alarmish reports and pc behaviour.
She wants more research into Sahel climate change due to greenhouse gases.
Nobody has detected any difference in the rest of the world, why should the Sahel be any different?
She should be grateful for the benefits of increased precipitation, I expect the locals are. Is she likely to get funding that might report positive feedbacks?
Thanks for sharing your insight, Doug.
Why does she start in 1979?
The Great Sahel Drought began around 1968 and lasted into the 1980's.
Lamb and others attributed the cause to global cooling,
https://notalotofpeopleknowthat.wordpress.com/2014/03/23/a-colder-climate-in-the-1970s-brought-widespread-drought/
Of course, if she had started in 1968, her conclusions would have been rather less alarmist!!
Why should she send you any data? You are only trying to find faults!
Precipitation event?
Is the word "rain" too short and clear?
Capitalism must die! Rain in the Sahel changes nothing. That is her message.
Thanks for pursuing these points, Doug. I would be shocked if the rainfall in the Sahel had an autocorrelation function substantially different to the Nile catchment, which of course was used by Hurst as the exemplar climate series exhibiting long term persistence (thanks to the 700-year unbroken instrumental record maintained through medieval times).
That a climate scientist would not be aware of this boggles the mind.
"Precipitation event? Is the word "rain" too short and clear?" --rabbit
Incontrovertibly. But I prefer 'atmospheric condensation phenomenon,' myself.
Off Topic, but not so much that it should not be posted here:
Seeking a reasonable way forward, Tom Fuller at "The Lukewarmer's Way" blog has written a new book,
http://www.amazon.com/dp/B015913P38/ref=cm_sw_su_dp
Not unsurprisingly it is named "The Lukewarmer's Way". Tom has been unflinching in pointing out flaws and fallacies promoted by the cliamte consensus. This new book reviews and documents what he believes is a reasonable way forward.
I don't agree with all he has concluded in this brief book, but I respect him for sticking to facts, for seeking reasonable solutions, and for putting the "crisis" into an honest perspective.
The Earth Institute is of course headed by Jeffrey Sachs, catastrophic economist and climate adviser to the Pope. Output would have to be alarmist.
http://unsdsn.org/wp-content/uploads/2014/02/130316-Development-and-Climate-Finance.pdf
"Financing for development and climate change post-2015"
Background paper for the High-Level Panel of Eminent Persons on the Post-2015 Development Agenda
Prepared by the Secretariat of the Sustainable Development Solutions Network
Jeffrey D. Sachs, Director, The Earth Institute, Columbia University
Director, The Sustainable Development Solutions Network
Special Advisor to Secretary-General Ban Ki-Moon on the Millennium Development Goals
Puff piece in Vanity Fair: http://www.vanityfair.com/news/2007/07/sachs200707
"One of Sachs's biggest supporters is the financier and philanthropist George Soros, who recently donated $50 million to the Millennium Villages Project. (The project is a partnership among the U.N., Columbia, and Sachs's own nonprofit organization, Millennium Promise.) According to Soros, whose foundation gives away between $350 million and $400 million a year, investing in Sachs offered an attractive "risk-reward ratio."
Soros is on the external advisory board, as is the recently sacked but still in post, head of TERI, Rajendra Pachauri and Dr. Tim Palmer, Head of Probability and Seasonal Forecasting Division European Centre for Medium-Range Weather Forecasts, UK
Marmite Man: "The Not-So-Great Professor: Jeffrey Sachs' Incredible Failure to Eradicate Poverty in Africa"
http://www.psmag.com/books-and-culture/smart-guy-jeffrey-sachs-nina-munk-idealist-poverty-failure-africa-65348
@Spence_uk
Thanks for bringing up the Nile series, which is what I cut my teeth on regarding the Hurst exponent, and was the first thing that came into my mind when the auto-correlation discussion above began.
Ignorance of this for a climate scientist seems to be all of a piece with their general way of working, which is to ignore all experience of the past under the label: "Not invented here." (pun intended).
How did the oxford "establishment" (a pile of stinking t*rds, imho) "manage" their obvious MASSIVE scientific c*ckup with respect to carbon dating ??
Lets make sure this gets 100% watertight documented with all emails, communication safely backed up.
Like with Galileo, they can uphold their cosy lifestyles thats for sure, but they will NOT uphold their REPUTATION on this one.
Goodies for the longterm.
Those percentages kill me. The only thing worse is expectation values covering up a temperature graph so you can't even see it. I go with Enest Rutherford: "If your experiment needs statistics, you ought to have done a better experiment." Just design your experiment right and you will not need statistics. You will also increase your readership.