SkS quietly withdraws allegation
Last week I ribbed Dana Nuccitelli and Gavin Schmidt over the former's comparing the mean of the Aldrin paper to the mode of Lewis's. Here's the quote:
One significant issue in Lewis' paper (in his abstract, in fact) is that in trying to show that his result is not an outlier, he claims that Aldrin et al. (2012) arrived at the same most likely [i.e. the mode] climate sensitivity estimate of 1.6°C, calling his result "identical to those from Aldrin et al. (2012)." However, this is simply a misrepresentation of their paper.
The authors of Aldrin et al. report a climate sensitivity value of 2.0°C [per the paper, the mean] under certain assumptions that they caution are not directly comparable to climate model-based estimates. When Aldrin et al. include a term for the influences of indirect aerosols and clouds, which they consider to be a more appropriate comparison to estimates such as the IPCC's model-based estimate of ~3°C, they report a sensitivity that increases up to 3.3°C. Their reported value is thus in good agreement with the full body of evidence as detailed in the IPCC report.
I was somewhat taken aback when Nuccitelli subsequently denied having done this:
Me: @dana1981 And you can't really duck the fact that you compared mean to mode. @ClimateOfGavin @wattsupwiththat
Nuccitelli: @aDissentient You have a strange definition of the word "fact", but that's not news.
Me: @dana1981 You are denying comparing mean to mode?
Nuccitelli: @aDissentient Sure. While we're at it, I'm also denying that the moon is made of cheese.
In the comments, Tom Curtis is remonstrated about Nuccitelli accusing Lewis of misrepresenting the match between his PDF and Aldrin's,
Dana correctly describes Lewis as claiming that the mode (most likely climate sensitivity) of his result is identical to the mode of Aldrin et al, but then incorrectly calls that claim a simple misrepresentation. It is not a misrepresentation. The modes of the two studies are identical to the first decimal point.
Now it has all changed. Look at the Skeptical Science page again (bold emphasis added):
One significant issue in Lewis' paper (in his abstract, in fact) is that in trying to show that his result is not an outlier, he claims that Aldrin et al. (2012) arrived at the same most likely climate sensitivity estimate of 1.6°C, calling his result "identical to those from Aldrin et al. (2012)." However, this is not an accurate of their paper.
The authors of Aldrin et al. report a mean climate sensitivity value of 2.0°C under certain assumptions that they caution are not directly comparable to climate model-based estimates. When Aldrin et al. include a term for the influences of indirect aerosols and clouds, which they consider to be a more appropriate comparison to estimates such as the IPCC's model-based estimate of ~3°C, they report a sensitivity that increases up to 3.3°C. Their reported value is thus in good agreement with the full body of evidence as detailed in the IPCC report.
This seems to be a result for Tom Curtis. However, he then goes on to make a very strange point:
[Lewis's claim] is...misleading in that it is an apples and oranges comparison. Given that other studies report the mean, in comparing with other studies the mean should be reported, or it should be made absolutely clear that not only are you reporting the mode, but that the authors you are reporting on reported the mean.
The idea that comparing mode to mode is "apples to oranges" is pretty strange. To say it is "misleading" is again absolutely extraordinary when one notes that the IPCC doesn't consider means either - it reports medians and modes. This is only natural to do so when considering skewed distributions since the mean is strongly influenced by outliers.
The other reason for using the mode is that it is largely unaffected by choice of prior, so by using it one can better understand what the Lewis paper means, namely that the Lewis and Aldrin approaches give the same best estimate of climate sensitivity, but the adoption of the objective Bayesian approach gives a more constrained estimate.
Reader Comments (53)
When in a hole stop digging ;)
Will they ever learn, I hope not !!!!
Well done for plugging away at these matters. Sensitivity has a central role in the IPCC framework and argument. Although use of mode rather than mean may seem a small detail it isn't. As we focus in on such things it's getting harder to paint sceptics as ignorant bigots - largely because of Nic's excellent work.
Speaking of SkS, John Cook in the Conversation has today provided a graph of a hockey stick extraordinaire. Whilst lamenting the increasing public apathy in Climate Science, he gives us proof of the astonishing increase in the number of Climate Science papers.
I commented, probably incorrectly, that I wondered if it had anything to do with the increase in funding.
GrantB,
I guess his last point is kind of correct, of course all the evidence is going to pile up to support their point of view...but only because anything and everything that contradicts their religious beliefs is suppressed.
Regards
Mailman
It is amazing to witness the intellectual heights of a climate science discussion. As weather, and its statistical construct over a period of time: climate, are driven by (partly) known physical processes of which C02 only constitues a tiny amount and hence influence, we better put our energy into understanding such processes than throwing sensitivity values to impress each other.
Anyone ask Dr. Nuccitelli how that cheese tastes?
This conversation remiinded me of Anscombe's Quartet which reminds you to always look graphically at the data before assigning value to analysis.
John Cook's claim of '97% consensus' is fraudulent.
I've examined what Tom Curtis says at other junctures. Just as here, he makes careless, unsubstantiated statements. These are mixed in with more well-founded ones.
This is primarily due to belief along the lines of 'Well, Dana1891 may be wrong in this one point, but he is right in the larger scheme'.
Richard Drake +1
The warmista rewriting history - never!
Being Green means never having to say you're sorry, and never admitting you are wrong.
The Truro lass appears to have dyslexia. Or is it an incredibly brilliant ploy?
Indeed comparing mean values makes no sense in this context. If you don't like the mode (like me, actually), then compare medians. The median is both robust and meaningful.
Cees de Valk -
While I would agree that the median is generally a better guide to central tendency than mode, in this case I would differ. The mode is far less sensitive to the choice of prior than median. The mean, as you say, is not an appropriate metric in this context.
eeny meany miny mode
Dana can't man up and admit his mistake ...just like Mann.
Will he sport a Van Dyke next?
Can someone explain the terms "median" and "mode" for me, and say why they're useful. (I understand "mean", the arithmetic average). For example, what is the mode and what the median of the following numbers whose mean is 27?
18, 24, 24, 25, 29, 33, 36.
Simon,
Your example is comprised of only 7 discrete values and is not a good comparison with the continuous PDF which is the subject of the discussion. Notwithstanding this the median of your numbers is 25 (because it is the middle value) and the mode would be 24 (the most frequently occuring).
Note also that for example making the first value much smaller or the last value much larger would change the mean but would not effect either the mode or the median.
since skeptical science mentions another [second],[Norwegian] study several times matching Nic Lewis's study would this not make Nic Lewis's study an example of second study syndrome rather than single study syndrome?
" Taken from Climate Sensitivity Single Study Syndrome, Nic Lewis Edition"
angech, no it wouldn't. This is clearly the one and only "Climate Sensitivity Single Study Syndrome, Nic Lewis Edition"
A perfect example of intellectual dishonesty. See Currys blog on the topic.
@thinking scientist
Thanks. I was rather hoping for an explanation in words of what sort of an average the terms "mode" and "median" suggest. If my journey time to work takes 30 minutes typically, but sometimes as little as 25 minutes and occasionally as much as 60 I can see that I might not base my day to day expectation on the mean. But would I choose the mode or the median? Maybe in my journey time example they might give identical results if measured in minutes but if I timed my journeys to the second I'd expect few identical readings and the idea of the most frequently occurring value would be inappropriate, making the concept of the mode misleading.
So what is it that makes the mode or the median relevant in a particular application? What are the essential characteristics of what is being studied that guide the statistician?
Simon Abingdon,
The median is the journey time you would expect on a "typical" day, and works extremely well with things like journey times timed to the second. If you make a large number of journeys then half of them will be shorter than the median time, and half will be longer. (For small numbers of journeys this gets messed up slightly by journeys that take exactly the median time, but for large numbers of journey times recorded to silly precision this doesn't matter, and even for small numbers of journeys the median is perfectly well defined and still has the "typical" interpretation).
The mode is "the most probable" journey time (the time which is more probable than any other) and only really makes sense if either you have a large number of journey times which you have binned (say to the nearest minute or nearest five minutes) or if you have replaced the discrete data by some smoothly varying model.
Modes are messy in general, and really only make sense if the distribution is well behaved (smooth, single peaked, etc.); personally I prefer medians for most purposes.
"However, this is not an accurate of their paper"
An accurate what? I assume that's not the Bish's typo...
Nuccitelli has clearly taken lessons from the master of deception.
Michael E. Mann.
James P, a single click of the SkS link provided could have confirmed that this isn't a Bish error. Just too lazy?
For shub: John Cook is fraudulant!
@James P
"However, this is not an accurate of their paper"
Presumably they meant to alter "this is a misrepresentation ..." to "this is not an accurate representation ...", but their proofreading skills appear to be the equal of their science skills.
Most people find themselves needing to say "I made a mistake" from time to time. But it does not seem to be in the vocabulary of not these guys.
___________________________________________________________________________________________
Basil Fawlty: "What am I going to say to them?"
Mrs Fawlty: "Just say 'I am sorry, I made a mistake' ."
~~~~~~~~~
Basil Fawlty: "I am sorry, my wife made a mistake."
Stan, I can't decide if John Cook is more scary or creepy.
Simon,
I share your confusion. With discrete values, the mean, mode and median are clear. With a continuous distribution, the mean and median are still clear, but the mode becomes murky in my mind. I think in this context, the meaning of the mode is merely the peak of the pdf.
However, using the phrase "most likely" strikes me as very odd since any specific value of a continuous distribution occurs with probability zero, which does not strike me as very likely at all! Again, I think it is all just a way of identifying the peak of the density.
James
Jonathan, thank you for your explanation.
I'm now pondering "If you make a large number of journeys then half of them will be shorter than the median time, and half will be longer". Maybe this is just the definition of median, but on the other hand the journey time I expect on a "typical" day can be slightly lessened if circumstances turn out to be particularly favourable (not very often) or considerably worsened (which in comparison seems quite often) if things go wrong. Perhaps it's just that my expected "typical" journey time is too optimistic from the outset.
But I do find it interesting, having started with the definition "The median is the journey time you would expect on a 'typical' day" that you then draw that 50/50 inference which I am pondering. With luck it will soon dawn on me why this must be so.
Simon, what's the average wage of all the people in the bar of your local pub? Probably close to the national average wage, whether mean, median, or mode.
And what happens to that average if Bill Gates drops in for a drink?
The median and mode remain about the same - the mean goes off the scale.
Simon abingdon: I suggest you make a note of all your journey times over a year, put them into say 2 minute bins and then enter the number in each bin and plot them in excel (you aren't Phil Jones coming here on the sly for a bit of knowledge are you?). If you mark the mode, mean and median, I'm sure it will then all become clear to you and you will understand the worth of each metric. It will be useful because you will know alll sorts of useful facts, such as how long you will need to allow so that you have a 50% or 95% or 99% probability of arriving at work on time (assuming you have a large enough sample and circumstances don't change).
"But I do find it interesting, having started with the definition "The median is the journey time you would expect on a 'typical' day" that you then draw that 50/50 inference which I am pondering. With luck it will soon dawn on me why this must be so."
The Median literally is the number in the middle of your list of values. It is a very good estimate for your 'typical day'. I would say the mode is the best value for your typical day, but as pointed out you have to define the list of values properly before it makes any sense. If you recorded your journey time to the microsecond then every single time would be unique and you wouldn't have a mode.
Simon Abingdon,
By a "typical day" I just mean something like take all the days you had over a year, order then from best to worst, and then a typical day is the one in the middle. Typical here simply means that half the days are better and half the days are worse: it's not a good day or a bad day but a typical day. Up to a few technical details this is precisely the definition of the median.
You might, however, choose a different definition, which I will call an "ordinary" day, which is the sort of day that happens more often than any other sort of day. That's the mode.
Of course in general a typical day need not be an ordinary day: you seem to be suggesting that on an ordinary day nothing special happens to you, but when something special happens it is more likely to go wrong than go right. In such a case one might expect the typical day (median) to be worse than the ordinary day (mode).
As you also point out, when something goes wrong it can go very badly wrong, while there are limits on how well things can go (your journey time home is probably lower bounded by speed limits, or at least the maximum speed of your car, or at very least the speed of light, but the upper bound is essentially indefinite if you get stuck in a snowdrift for example). This is a very common situation, and in such cases the average day (mean) is worse than the typical day (median).
Which definition is most useful depends very much on what you are trying to achieve by summarising your distribution of travel experiences. No one number can capture the whole range of life. On the whole I prefer the typical day (median) or the average day (mean), but it's very much horses for courses. One popular approach is to quote the median and mean, with the ratio between these providing a handy summary of how "skewed" your life is.
Jonathan "only really makes sense if either you have a large number of journey times which you have binned".
Ah, a technical term. At first I thought you were talking about the wpb.
Speaking as a geologist who analyzes grain sizes. We sometimes run into situations where we have two grain sizes that show up more than others. For example: a grain size analysis shows that we have a large amount of samples that occur at 2mm and also at 0.5mm, we would refer to that as a sample with a bi-modal distribution. Thinkingscientist is right. The mode is the value that shows up the most times in a given population of values, be they commute times, grain sizes, whatever. Lets say in this case the Modal value is 32 min. The median is simply the value of the sample at the mid-point of your sample population. Let's say you kept track of your commute times for 31 days. The median is value of the commute time on the 16th day. Lets say: 35 min. Now, if during that 31 day period, your fastest commute time was 25 min and your longest was 60 min. The mean value would 42.5 min. To answer your question about commuting times I would use the modal value of 32 min because that time occurs the most often in your commute. The reason median and mode are used is that they are unaffected by extreme values of the population, as steveta points out.
Jonathan, please disregard my frivolous 5:24 which I sent before I'd read your follow-up 5:16 which I now much appreciate.
Your answer to my journey time question, making the difference between median and mode equivalent to the subtle difference between a typical day and an ordinary day is I think a very persuasive and informative analogy. Thank you again .
Simon and the others here: Check any elementary statistics text book. The median is the middle number of a set of ordered numbers (smallest to largest). See here: http://gwydir.demon.co.uk/jo/numbers/pictogram/box.htm
I did a webcite query to see what I'd get. Here is the earliest version of that page on April 18, at 05:42: http://www.webcitation.org/query?id=1366321331471189&date=%400&fromform=1
That's interesting because I read that page AFTER that time and date and it didn't look like that (it had the mode and mean comparison). Even the "However, this is not an accurate of their paper" error is correct in that earliest version.
I don't know the ins and outs of webcite or Wayback or whatever but that seems a little strange.
Gilbert
A quick clarification, re your commute times example. If we record commute times for a month and arrange the times in order, the median would be the mid-point of the series.
The median itself may occur on any given day of the month, including the mid-month day (the 16th) (!) ;)
Re steveta
Hockey Sticks again.
Well, although I've messed around with various aspects of probability theory, I've never had reason to use the mode, so take what I say with appropriate sanity warnings.
My understanding is that the mode, m, is the value at which the probability density function has its maximum value (assuming this is a unique value). So, as you say, the probability of getting precisely this value is zero (assuming that the probability density function contains no delta functions).
The random variable is more likely to fall in the range m ± D than any other range of extent 2D (where D is a small value).
So I can see that it has the attraction of answering the question "What value are we most likely to get?" with the answer "it's more likely to be close to m than close to any other value".
I can see that some listeners could then mistakenly think "Ah, they said it was going to be pretty close to m".
[corrections of any misconceptions on my part welcomed]
@ shub: I stand corrected. I stated the problem wrong. I forgot that the series was an "ordered series" The median actually does not have a "particular" day of the month associated with it. It is simply the mid-point of the series; as you so correctly state.
Suppose you have a set of voters with preferences arranged along a single axis, and two political parties occupying the two ends, such that anyone to the left of a certain point prefers party A and anyone to the right of that point prefers party B. Who wins the election?
If the split is to the right of the median voter, then more than half the voters favour party A, as does the median voter. If the split is to the left of the median voter, then more than half the voters favour party B, including the median voter. Therefore whichever party the median voter favours will win the election. That's why politicians fight over the middle ground, and why their policies are almost identical.
The median voter may be thought of as a sort of absolute dictator - they control every decision in a two-party system, and they determine the policies of both parties. The median-man-on-the-street (or woman) is a very important person... :-)
People interested in the concepts Nullius is outlining might want to take a look at Hotelling's law and the Median voter theorem.
NiV:
"The median voter may be thought of as a sort of absolute dictator - they control every decision in a two-party system, and they determine the policies of both parties. The median-man-on-the-street (or woman) is a very important person... :-)"
That reminds me of an Alan Ramsey (journo/columnist for the Sydney Morning Herald) article on 'swinging voters' from way back. I couldn't find the whole piece on the web but I could locate this bit:
"the minimum number of thoughts repeated the maximum number of times"
How true. And don't mention uncertainties..
Here's a real nail in the coffin of AGW and sea level rise
http://people.duke.edu/~ns2002/pdf/10.1007_s00382-013-1771-3.pdf
(H/T to Tallbloke)
Maybe a real climate (pun intended) psientist can explain why this paper is flawed/doesn't matter?
"the minimum number of thoughts repeated the maximum number of times"
sounds like a varaition of Goebbels' notorous maxim:
“If you tell a lie big enough and keep repeating it, people will eventually come to believe it. The lie can be maintained only for such time as the State can shield the people from the political, economic and/or military consequences of the lie. It thus becomes vitally important for the State to use all of its powers to repress dissent, for the truth is the mortal enemy of the lie, and thus by extension, the truth is the greatest enemy of the State.”
For "state" substitute "advocates of climate change"