Click images for more details



Recent comments
Recent posts
Currently discussing

A few sites I've stumbled across recently....

Powered by Squarespace
« Is it or ain't it Rashit? | Main | Number 10 discusses shale gas »

Jeff Masters on Mann and PCA

Jeff Masters, the meteorologist who blogs at, has written the standard-issue five star review of Mann's Hockey Stick and the Climate Wars.

I thought I'd highlight something Masters wrote about the infamous short-centred principal components analysis used in Mann's paper.

[Mann] takes the reader on a 5-page college-level discussion of the main technique used, Principal Component Analysis (PCA), and shows how his famed "hockey stick" graph came about. It's one of the best descriptions I've seen on how PCA works (though it will be too technical for some.)

Mann, rather hilariously, describes the short-centring technique he applied in MBH98 as "the modern centering convention", and Masters seems to have swallowed this story whole.

Let us defer here to Ian Jolliffe, an expert in principal components analysis. Here he is discussing a talk he had given, which was claimed to endorse short-centring (which he refers to as decentred PCA).

[T]here is a strong implication that I have endorsed ‘decentred PCA’. This is ‘just plain wrong’...

[My talk] certainly does not endorse decentred PCA. Indeed I had not understood what MBH had done until a few months ago. Furthermore, the talk is distinctly cool about anything other than the usual column-centred version of PCA. It gives situations where uncentred or doubly-centred versions might conceivably be of use, but especially for uncentred analyses, these are fairly restricted special cases. It is said that for all these different centrings ‘it’s less clear what we are optimising and how to interpret the results’.

I can’t claim to have read more than a tiny fraction of the vast amount written on the controversy surrounding decentred PCA (life is too short), but from what I’ve seen, this quote is entirely appropriate for that technique. There are an awful lot of red herrings, and a fair amount of bluster, out there in the discussion I’ve seen, but my main concern is that I don’t know how to interpret the results when such a strange centring is used? Does anyone? What are you optimising? A peculiar mixture of means and variances? An argument I’ve seen is that the standard PCA and decentred PCA are simply different ways of describing/decomposing the data, so decentring is OK. But equally, if both are OK, why be perverse and choose the technique whose results are hard to interpret? Of course, given that the data appear to be non-stationary, it’s arguable whether you should be using any type of PCA.

So the world's leading expert on principal components analysis says that data analysed through Mann's "modern-centring convention" are uninterpretable. They are meaningless. Dr Masters says, nevertheless, that Mann has given one of the best explanations of the technique he has ever seen.

Perhaps Dr Masters needs to find himself a friendly statistician.


PrintView Printer Friendly Version

Reader Comments (49)

"It's one of the best descriptions I've seen on how PCA works"

And the others are..?

May 18, 2012 at 11:46 AM | Unregistered CommenterJames P

I find the description given in The Hockey Stick Illusion first rate, in that it lets you see how crap it is.

May 18, 2012 at 11:48 AM | Unregistered CommenterTheBigYinJames

Ah, good times. I was reading Tamino's defense of the Hockey stick when Ian Jolliffe chimed in, tardive and reticent, but loud as thunder cross the Bay. Is the thread still available?

May 18, 2012 at 11:57 AM | Unregistered Commenterkim

I think this Master's guy is trying harder to look more detailed in his "review" so performs a better cargo cult imitation of one than the other more mumbling embarassed reviews. However instead of anything that could be taken as a critical assessment all he really does is obligingly go through all Mann's check points and gives us a potted version of the book. So we get

* Mann innocently blinking accidentally stumbled into a change of career of paleo climatology and was shocked by the criticism
* Sceptics are all industry funded.
* NAS approved for the hockey stick
* climategate emails were taken out context
* Mann invented the term "Atlantic Multidecadal Oscillation"

When he does stray into assessment territory it is annoying i.e. I got a bit annoyed by this

the best descriptions I've seen on how PCA works (though it will be too technical for some.)

well this guy must be so used to crap PCA decriptions that he doesn't know better. I think this is bull since I know the Stephen J Gould book The Mismeasure of Man has a far better one. I learned of the Gould book because Mann implies that criticism of his short centering technique is reminiscent of the mistake of reification of intelligence IQ testing in that book. Mann's explanation of PCA is basically a very muddled self-justifcation of short centering that didn't make sense to me, I really didn't think it even attempt a high level of technicality as far as I could see and looked very much like a toy description, so I don't see how it could be "too technical for some." Maybe too crap for some?

As I read the book, I was impressed by Dr. Mann's tremendous passion for science and knowledge that comes through. He loves figuring out how things work, and stands in fierce opposition to shoddy science and anti-science political attacks.

He must mean all the time Mann bleats about every criticism of his work being "anti-science" and every time he does something scientific in the book this is just self evidently great and impressive. Another passive "As I read the book" - it somehow seeped into his consiousnes "as he read it" without really being verbally articulated we all must just pick up how great Mann is ;)

A new way to not review a book in a suck up self serving way.

May 18, 2012 at 12:11 PM | Registered CommenterThe Leopard In The Basement

It's invasion of the body snatchers again

May 18, 2012 at 12:11 PM | Unregistered CommenterPaul


The comment is here, but the original has gone from Tamino's Open Mind

May 18, 2012 at 12:15 PM | Unregistered CommenterMorph

Much gracious, Morph.

May 18, 2012 at 12:28 PM | Unregistered Commenterkim

The comment is here, but the original has gone from Tamino's Open Mind

May 18, 2012 at 12:15 PM | Morph

You me his mind has close again. :)

May 18, 2012 at 12:40 PM | Unregistered Commenterstephen richards

May 18, 2012 at 12:44 PM | Unregistered Commenterclivere

Ian Jolliffe's comments were deep in an open thread, shortly after Tamino's sequence of posts on PCA where he claimed Jolliffe in suport.

May 18, 2012 at 12:44 PM | Unregistered CommenterAndyL

The thread is gone from Open (sic) Mind. Not only that, "Jolliffe" isn't anywhere on his site (unless he's specifically blocking that term on his site).

It is available via the wayback machine... gotta love that site!

May 18, 2012 at 1:09 PM | Unregistered Commenterdean_1230

You may find Joliffe's comment here. Search for "September 8, 2008 at 9:36 am".

[Earlier version with direct link to comment didn't seem to work]

May 18, 2012 at 1:15 PM | Registered CommenterHaroldW

I'm not sure that the above link is any better...Odd, the direct link worked when clicked from the wayback machine page, but didn't seem to work from the post. Anyway, it is find-able from Wayback Machine.

Later in the post the question is posed to Tamino:
"Further, do you agree with the remainder of his [Joliffe's] statement “it is crazy …that a group of influential climate scientists have doggedly defended a piece of dubious statistics”

With the answer:
"[Response: No I do *not* agree that "IPCC and Gore" should not have given such prominence to the hockey stick. Your question is itself dishonest; it's the denialist camp which has focused too much attention on the hockey stick, painting it as a crucial centerpiece of climate science, which it is not.]"

Very amusing.

May 18, 2012 at 1:42 PM | Registered CommenterHaroldW

Kim and Morph: The wayback machine captured the thread at Tamino's:

May 18, 2012 at 1:55 PM | Unregistered CommenterBob Tisdale

It looks like Tamino moved site in March 2010 and everything before then has gone

May 18, 2012 at 2:07 PM | Unregistered CommenterAndyL

Funny that something so unimportant has so many people trying so hard to defend it.

May 18, 2012 at 2:37 PM | Unregistered CommenterStuck-record

The use of the 'original' statistics should have sent a great big flag up form the very beginning.
But the 'stick' is no longer pace of poor research but an icon of 'the cause' and IPCC corner stone it therefor has to be defended unto death by the AGW faithful no matter how actually crap it is. And for Mann its EVERYTHING , that goes he goes and he knows it.

May 18, 2012 at 2:49 PM | Unregistered CommenterKnR

Stuck-record: the team is in a quandary, Following the discovery in 1997 that CO2 followed T by 800 years +/-, the hockey stick was essential 'to prove' CO2 climate sensitivity could be calibrated from modern warming.

The recent Shakun paper in the Nature polemic was an attempt to con the MSM that CO2 rise was before average T rise, but it also proved Time Travel exists because the Southern hemisphere still warms 800 years before CO2 rises!

May 18, 2012 at 3:18 PM | Unregistered Commentermydogsgotnonose

Generally speaking politics is showbiz for ugly people and science is showbiz for Clever Kids

Clever kids who always did their homework and got picked last for the football team

I used to think that scientices were driven by their consent quest for funding
Theres Ego in there also

Will this Micheal Mann everl be as famous as the other Hollywood Micheal Mann who directed Manhunter Miami Vice and Heat

May 18, 2012 at 3:45 PM | Unregistered CommenterJamspid

So Briffa's life work is an unimportant part of Mann's life work which in turn is not so important after all. I must have forgotten how to laugh.

ps Bish: you should have highlighted this bit too by Joliffe: it’s arguable whether you should be using any type of PCA

pps how would the HS have looked if non-decentered PCA had been used?

May 18, 2012 at 3:52 PM | Registered Commenteromnologos

Will Micheal Mann ever be as famous as that other mad TV scientice Sheldon Cooper
They had Mr Spock leonard Nimmoy on Big Bang last week and Stephen Hawking on there this week

Nothing wrong with being driven by Ego if you want to get things done
Just say so if you cant be accurate
Just dont over exaggerate

May 18, 2012 at 4:06 PM | Unregistered CommenterJamspid

Yeah, Maurizio, note the irony. Defended bitterly as colours, yet they're just pigments. This is dissonance, and it makes me laugh sonically.

May 18, 2012 at 4:45 PM | Unregistered Commenterkim

I wish Jeff Masters could get over his AGW obsession.

Weather Underground is my favorite site for weather info, which selectable choices for local weather stations, and customizable radar displays. It hurts my conscience each time I use the site because of Masters' AGW advocacy!

May 18, 2012 at 5:23 PM | Unregistered CommenterD Johnson

"Will this Micheal Mann everl be as famous as the other Hollywood Micheal Mann who directed Manhunter Miami Vice and Heat[?]-- Jamspid

Oh, I certainly hope so.

May 18, 2012 at 5:52 PM | Unregistered Commenterjorgekafkazar

I notice he criticise's Anthony Watts for, 'Some of the 1 star reviews are no doubt there because “Watt’s Up With That,” one of the most prominent climate science confusion sites, put up a post calling on readers to attack Mann’s book and to attack positive reviews.'

And the justify this, he links to Watts' site. Oh no he doesn't, he links to Joe Romm, who accuses Watts of telling people to attack the book. When you actually go to Watts Up With That, you find Watts wrote:

"Mann’s book currently has 15 reviews on Amazon, all five-star, many by his warmist friends. I hope some climate realists eventually review the book as well."

And then updated with,

"While I realize that many people don’t want to buy this book, please don’t pull a Peter Gleick and do reviews apparently in absentia. (I can’t emphasize this enough – don’t post a review if you have not read it.)"

May 18, 2012 at 7:48 PM | Unregistered CommenterMrPotarto

James P wrote:

""It's one of the best descriptions I've seen on how PCA works"

And the others are..?"

There is a quite detailed article on PCA at Wikipedia, as well as descripions in many textbooks. The method isn't actually very difficult to understand, if you have a basic knowledge of matrix algebra.

PCA is quite central to a number of the methods used in climate science. For instance, it is central to the "optimal fingerprint" methods used in AGW/ climate change detection and attribution, and in the Bayesian/ optimal fingerprint based methods of estimating climate sensitivity that the IPCC gave prominence to in AR4.

May 18, 2012 at 9:21 PM | Unregistered CommenterNic Lewis

Of course, given that the data appear to be non-stationary, it’s arguable whether you should be using any type of PCA.

This is the key (my bold) that got me into the entire climate debate to begin with. I do this for a living (extract "signals" from "noise," PCA being one of many related algorithms) and immediately wondered why someone would think what amounts to an eigenvalue analysis has any relevance/skill when applied to tree-ring data. In fact, I'm not even sure this point is "arguable" in reference to tree-ring data as Joliffe states. There are other issues such as time-variance in the combination of "inputs" (linearity must be maintained) though such things could easily show up as non-stationarity (though not necessarily). I could go on...

I had a paper once that had a really cool "stationarity" calculation that I need to dredge up. It would be interesting to apply to data that are apparently (almost obviously) non-stationary. Comm/radar data generally tend to be stationary (mostly) - both signal and noise, so I don't keep such techniques on my mind. Heck, with tree-rings, there isn't even a clear distinction between "signal" and "noise," at least, not an a priori distinction.


May 18, 2012 at 11:07 PM | Unregistered CommenterMark T

if you have a basic knowledge of matrix algebra.

Yeah, EVD, SVD, etc., are all methods of calculating the principal components, which are generally described rather well in any decent linear algebra text. Hyvärinen, Karhunen, and Oja describe it in their book* using SVD - in a manner that is quite easy to understand, as an introductory chapter to their ICA algorithms.


* Independent Component Analysis, Wiley 2001.

May 18, 2012 at 11:14 PM | Unregistered CommenterMark T

Ian Jolliffe's comments were deep in an open thread, shortly after Tamino's sequence of posts on PCA where he claimed Jolliffe in suport.

Proving, to me at least, what I had already suspected: Grant has a clue about how to implement the math he does, and the basics of what it can/may reveal, but not enough of one to to truly understand how to apply it.


May 18, 2012 at 11:28 PM | Unregistered CommenterMark T

Mark T - have you ever gone to Clinmate audit and read the technical posts? You might enjoy the discussions there.

May 18, 2012 at 11:35 PM | Unregistered Commenterdiogenes

From memory, I think Mark T is an old-time CA contributor...

May 18, 2012 at 11:41 PM | Unregistered Commenterwoodentop

What i wonder with these tree rings is if there is only width to be measured ?
I mean the objections re warm-dry vs cold-wet which could yield same widths are obvious.

do the dead wood discs only leave you with same same cellulose, or are there other measurable variables in the material? In the end plants make more proteins than humans do.

May 19, 2012 at 12:35 AM | Unregistered Commenterptw

steve mcintyre got himself a new russian friend i

May 19, 2012 at 12:37 AM | Unregistered Commenterptw

Yes, diogenes. My first post was a few months after CA opened its doors (so to speak). CA spurred my initial interest in component analysis and, not surprisingly, my interest in what would eventually become a PhD dissertation. I used Modified Gram-Schmidt orthogonalization for the PCA portion, btw...


May 19, 2012 at 2:15 AM | Unregistered CommenterMark T

Ptw: density, too.


May 19, 2012 at 2:16 AM | Unregistered CommenterMark T

I think it should be noted that Ian Jolliffe is (or was at the time of his comment) a strong supporter of the AGW hypothesis, and included in his comments words to the effect that "this Mann graph is invalid, but there are lots more other data series which show a hockey stick..."

May 19, 2012 at 2:43 AM | Unregistered Commenterdodgy geezer

"and included in his comments words to the effect that"

Ya gotta wonder why he felt it necessary to throw that comment in? Casts doubt on his credibility, IMO. Whether or not "other graphs" show a hockey stick is grossly immaterial to the veracity of the Mann graph. He was afraid nobody would take him seriously if he didn't throw that caveat in. A shame, actually, that it has gotten so far that "experts" need to show their support in order to be believed.


May 19, 2012 at 3:43 AM | Unregistered CommenterMark T

From the Ecclesiastical Uncle, an old retired bureaucrat in a field only remotely related to climate, with minimal qualifications and only half a mind.

It is a wonder to me that the potted autobiography Mann apparently included in his book seems to have included no attempt to ‘spin’ statistical experience out of his early life. Surely, if done with his usual gusto and neglect of actuality, it could have provided a platform to lend later dismissals of criticisms of his statistical manipulations more credence, at least in his own mind.

And is all that stuff about temperature variations in the Alantic true, or significant?

May 19, 2012 at 6:16 AM | Unregistered CommenterEcclesiastical Uncle

The basic problem is that statisticians are too isolated. They need to consult more with climatologists to see what conclusions are needed.

May 19, 2012 at 7:58 AM | Unregistered CommenterPunksta

"Ya gotta wonder why he felt it necessary to throw that comment in? Casts doubt on his credibility, IMO. Whether or not "other graphs" show a hockey stick is grossly immaterial to the veracity of the Mann graph..."

What it read like to me is a protestation that he was NOT a 'denier', and that he still firmly believed in the 'right things'.It was a bit like a historian finding evidence that Hitler had not actually signed an important order calling for Jews to be rounded up - he needed to correct the point, but he was scared that the 'correction' would be used to demolish an established historical 'truth' that he believed in.

You could see that he wished that he did not have to put his head above the parapet and make the correction. I am not surprised that he did not enter the fray earlier - I'll bet he has had some sharp words to say to Tamino for dragging him into the controversy....

May 19, 2012 at 8:35 AM | Unregistered Commenterdodgy geezer

Indeed, dodgy geezer, a problem a lot of "skeptics in name only" have. Their attempts to make peace with their masters while still noting the problems they are aware of are, to me, cowardly.

Joliffe is wrong, btw, there aren't really many graphs that show a hockey stick unless you count all the ones related to Mann's work - all suffering from the same problems.


May 19, 2012 at 3:00 PM | Unregistered CommenterMark T

Mark T-
I think you're far too harsh on Jolliffe's comment. Here's what he wrote:
"I am by no means a climate change denier. My strong impressive [sic] is that the evidence rests on much much more than the hockey stick." Which is the position of most people new to the climate change discussion, to be honest.

It's a version of the Crichton/Gell-Mann "amnesia effect." Jolliffe found out the Hockey Stick was not well-founded, but assumes that the other evidence for dangerous AGW is of a more typical robustness. One has to spend some time looking at papers to realize how tenuous some of the claims are.

May 19, 2012 at 5:33 PM | Registered CommenterHaroldW

Maurizio brought up this sentence from Jolliffe's comment: "it’s arguable whether you should be using any type of PCA." In a subsequent comment, Jolliffe described this as "something of a throwaway remark." He writes that "In its purest form PCA is relevant to (multivariate) data that are independent and identically distributed" and that the proxy data do not satisfy those criteria. However, he believes that PCA may well give reasonable results anyway, as long one is careful not to read too much into them. [My one-sentence summary of a paragraph which should be read.]

May 19, 2012 at 5:55 PM | Registered CommenterHaroldW

Mark T, what is meant by 'stationary data'. I've looked at 'stationary processes' in Wiki and see the difference between white noise and a cymbal - its examples - but I don't understand how it relates to tree rings. Does the same problem exist with other proxies (ice cores, corals,...)?

If PCA is not the right technique, what is the correct method by which one can consolidate hundreds or thousands of datasets; has this been used with proxy data? Is it even possible to draw incontrovertible conclusions from such datasets or is there always room for disagreement?

May 19, 2012 at 6:13 PM | Unregistered CommenterBitBucket

my UPDATE: I did not realize when I wrote the comment below that there is a current active thread going "Rand Simberg reviews the Yamal story" which is getting some very interesting input from a variety of people including Rob Wilson. So I will let my questions stand as posed here but refer anyone thinking about these issues to the BH thread "Rand Simberg reviews the Yamal story"

To take a step back for a moment (I know this will point to vast issues that cannot be settled here or easily), why should one think this whole quest for a PCA type method is not based upon a circular argument?

Mannian statistics assumes that certain kinds of trees are exquisitely reliable thermometers across years, decades, and centuries -- then one goes looking for a "statistical method" that will enable you to cull just a select small sub-sample of trees which give the "desired" (or correct??) readings?

It seems like an absurdly circular method unless one can know to a high order of reliability that (1) the trees involved are such perfect thermometers, and (2) the method used to cull out only the "reliable" ones is fully justified. The PCA debate seems to throw a lot of doubt on (2) and I have yet to see any plausible indication of a defense of (1).

p.s. As I emphasize from time to time I am no scientist, and am limited to looking for who I think argues reasonably or unreasonably, what information and analyses seem most intelligent and credible for purposes of public policy decisions, etc.

May 19, 2012 at 11:16 PM | Registered CommenterSkiphil

So does nobody know what stationary data is? Or the correct way of analysing proxy data, if not PCA?

May 20, 2012 at 11:50 PM | Unregistered CommenterBitBucket


Sorry it took so long to reply but there were enough new topics that I thought this one was done... I will post a better explanation in a bit.


May 23, 2012 at 3:43 AM | Unregistered CommenterMark T

Ok, back at my desk...

In general, when signal processing folks speak of "stationarity" they are referring to what is known as wide-sense stationarity. Yeah, it sounds a bit circular. What that means is that the statistics, specifically mean and variance, do not change over time.

For the mean, this means that whatever function is driving the mean is constant over time. Pretty simple.

For the variance, the definition involves the autocorrelation function: E{x(t1)*x(t2)}. What is this? Well, the mean of a process is E{x(t)} where E{*} represents the expected value. Therefore, E{x(t1)*x(t2)} is the expected value of the "signal" at time t1 multiplied by the "signal" at time t2. If a signal (or function, or data, or whatever) is WSS, then the actual values of t1 and t2 do not matter, only the difference, t1 - t2. Note that if you take t1 = t2 (and do not scale by sigma_t1 and sigma_t2 as on the Wikipedia page), you get the variance of the signal itself. This means that the variance is constant over the duration of a WSS signal.

The reason this is necessary is that non-stationary data may result in different principal components over time. The tree-ring data is arguably non-stationary, and only a few even have the supposed hockey stick in the first place.

Does the same problem exist with other proxies (ice cores, corals,...)?

Sure, though I do not have enough detailed knowledge (off-hand) to know how bad of a problem it is. A bigger problem, IMO, is the theory behind why a data set is a proxy for temperature. For tree-rings, there is no valid hypothesis even, and what little study has been done indicates a much higher reliance on water than anything (botanists seem to know this).

If PCA is not the right technique, what is the correct method by which one can consolidate hundreds or thousands of datasets; has this been used with proxy data?

Hard to say. There are non-linear processing methods which are not in my standard bag of tricks, and I believe they rely on some additional information that may or may not exist. Online methods for calculating components are rather easy to implement (where "online" means a sort of running calculation like a moving average of sorts). Unfortunately, doing so reduces the number of points in any given window which thusly reduces processing gains from averaging out noise (whatever "noise" in such signals is).

Is it even possible to draw incontrovertible conclusions from such datasets or is there always room for disagreement?

Even harder to say. Incontrovertable is generally an unlikely goal, even when you have all of the information you need and perfectly stationary statistics with fixed linear systems. Infinite signal to noise ratio is the stuff of fantasy. :)

As noted by Joliffe, none of this means that all of the results are necessarily incorrect, just that they cannot be relied upon. In radar processing (noise riding thresholding) you have the probability of detection vs. the probability of false alarm problem. The two probabilities are inversely proportional. If you want to guarantee that you will almost always detect a valid target, you set your threshold lower as your SNR decreases. Of course, this means that your false alarm rate will rise accordingly, potentially to the point that you cannot tell if a detection is valid or a false alarm. The same goes with these PCA results. In fact, the problem is worse because a valid definition of "signal" and "noise" do not exist, so not only are the results unreliable, but you can't tell how unreliable they are! A sticky wicket indeed.


May 23, 2012 at 4:58 AM | Unregistered CommenterMark T

Hi Mark, I just checked back and found your reply! Sorry for not responding sooner :-(

I can see that for radar processing you send out a ping and get back noise + ping. Then you correlate the original ping against the received signal and bob's your uncle. Non-stationarity might be changes in the noise background, perhaps due to environmental changes, but even these would in the long term probably turn out to be stationary. For tree ring data, it seems to me that the non-stationarity is actually the signal. So the data is non-stationary by definition. Or did I misunderstand?

This doesn't really help me to understand how one should consolidate thousands of datasets, but it is interesting :-) Thanks for your detailed description.

Jun 16, 2012 at 8:49 PM | Unregistered CommenterBitBucket

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>