"Oh, people can come up with statistics to prove anything. 14% of people know that."
- Homer Simpson in Homer the Vigilante
We are constantly bombarded by statistics. Annual reports, news, medicine, beauty products, cereals -- each one of them has its own bit of statistics to tell us their story. It is hence very important that we know of the ways statistics can be used to deceive us.
This is going to be a post which reviews "How to Lie with Statistics" by Darrell Huff, which I just completed and also touches on the Peabody Energy (BTU) and its annual reports.
On the surface, the subject of statistics looks easy. The idea is better described by an example. Say that you have a bag of beans - some white and some red. How do you find out how many of each there are ? The precise, correct way is to count them!
This is easier said than done. Instead, you may decide to mix the bag quite thoroughly and then pick a handful of beans. You count the number of red and white beans in this small sample. Say there are 30 red for each 20 white beans. Then you conclude that there are 60% red beans in the bag.
Simple enough ? Apparently so, but let us dig a bit deeper.
There are two major assumptions for the result to be of any use.
1. The beans were mixed properly.
2. Enough of them were chosen.
Why ? Let me play the devil's advocate to justify why these conditions need to be true. If the bag was not mixed properly, it might happen that you chose a "non-representative" sample. For example, in an extreme case, you might pick from the portion which is all white and end up with 100% white. Similarly, if you just pick one bean then it will either be black or white. Again 100%. In both of these cases you would be wrong.
These are not easy hurdles to overcome. It is easy enough in the case of counting beans in a bag but quite difficult for any interesting real life problems. A few problems that might arise in surveying and then representing the results are:
Survivorship Bias - You can only sample people who are willing. In most cases, this is enough to create a bad data set. If a magazine tries to poll its readers by asking them to fill in a coupon and return it to them, then only the people who have something nice to report will mail in.
Acquiescence bias - People say what the surveyor wants to hear. The framing of the questions in the survey also has a very strong effect on the data. Default answers are chosen and this leads to another important bias - the framing bias. This deals with how the question is framed.
Social desirability bias - People under- or over-report depending on what they think is socially desirable. For example - the sample might over-report their salary or under-report the amount of time they waste doing nothing on the internet.
Mean vs Median - Average can have three different meanings: mean, median or mode. Mean is the sum of the data points divided by the number of data points; median is the cut-off at which half the data point lie above and half below, and mode is the most frequently occurring data point. There are data for which all three of these quantities lie quite close to each other (for example height) and there are others where they do not (income). As a rule of thumb - if the data has a bell-shaped distribution then all three give similar information, but when it is not they tell very different things. Whenever a report uses "average," it is important to ask "what kind?"
Sometimes the conclusions drawn from the data are totally out of whack. A very good example is the claim that "most accidents happen within 20 miles of the home" and hence "driving around your home is more dangerous." The data might be correct but the conclusion is false. People have more accidents within a 20 mile radius because we spend most of our time driving close by. This is an example of counting one thing and reporting it as another.News reporters are one of the worst offenders. It is not clear if they do it because of incompetence or to deceive and get your attention by eye-grabbing headlines. Huff argues that when all "mistakes" are made in only one direction - you can't claim incompetence all the time.
He quotes a New York Times article
Plumbers, plasterers, carpenters, painters and other affiliates with the Indianapolis Buildings Trade Union were given a 5 percent increase in wages. That gave back to the men one-fourth of the 20 percent cut they took last winter.The correct figure would be one-fifth. Many people make the mistake of adding and subtracting percentages like they are numbers.
The conclusion is that statistics are as much art as they are science - very similar to a financial statement. There are many things one can do within the confines of propriety. Annual reports are filled up with figures on how the company has grown and the improvement in business. I was recently reading the 2011 and 2012 report of Peabody Energy (BTU). Peabody is the largest private sector coal company in the world and is selling around its 52-week low. The year 2011 was a record year for Peabody. Revenue, operating profit, EBITDA, Income, Diluted EPS - everything was at its highest level yet. In the annual report, I noticed that in the first few pages (pages 4 and 5) all these graphs were included with nice colors among colors and apt photos. The next year, i.e., 2012 was not so good. The coal market took a dive, the company had to write down around $980 million in their Australian coal operations. Furthermore, the company had acquired McArthur Coal in 2011 for $5 billion and nearly all of this went to the debt section of the balance sheet. This leveraged the balance sheet and the depreciation/amortization as well as interest expense of debt became substantially higher. End result: Peabody made a loss of $575 million in 2012. You would not find it in the first few pages of the annual report though. There are only two graphs on page 2 - revenue and EBITDA. Revenue was higher than 2011 and EBITDA was only a tinsy bit smaller. As EBITDA does not include depreciation/amortization/writeoffs or interest expense - Peabody could happily report $1.8 billion of it.
I am not claiming that this was done intentionally to deceive. Given the debt load it is possible that EBITDA is the correct figure to report - to show that company is going to be good for its debt covenants. But the stark difference in the two annual reports is noticeable nevertheless.
I strongly recommend Huff's book. It is simple and at the same time covers many useful tricks for lying. I knew a few and was happy to discover a few more.