Saturday, 6 April 2013

MAT: Simpson's Paradox


So a student came into class the other day and asked if we could talk about Simpson's Paradox.


It happened to be the very day I was going to talk about it anyway.


Strictly speaking, it's not even in the curriculum.

Get out!

I'm serious!

Me too. Get out of here, you're not allowed to divert from curriculum.


Simpson's Paradox, or the Simpson-Yule effect, effectively means that, by aggregating everything together, you might get the OPPOSITE results from what you see individually. In other words, A beats B at poker, A beats B at chess, A beats B at checkers, so the overall winner? B, naturally. (Since he's not being hassled by the paparazzi like A is.)

But no, seriously, it's explained better on the Less Wrong Blog at that site, and by James Grime in this YouTube video - which I've shown in class. It's gaining a fair bit of awareness out there these days... I'm not even the first person to blog about it this year. That last site has a few nice images with the explanations too.

Your takeaway? Summaries are always obscuring the truth, if not outright lying to you. It's better to have the individual data - then you can create an overall aggregate if you choose. Yet in our busy lives, we tend take percentages at par! Ignoring that they're useless without knowing what made them. This news article from 2009 references the problem as well, with lots of practical examples, including one in sports.


I forget exactly when I stumbled upon the scenario - probably in the maths book I own by Martin Gardner - but I have since grabbed examples from a statistics text to use in a class handout for one day, when I teach Data Management (statistics).


He'd been reading up on the Paradox after having a disagreement with his employer. The employer had taken the individual sales averages (for each employee), and averaged them together to create one for the store. Apparently, store sales were slipping.

The student rightly pointed out that, if you added up the individual sales for the units, you would get a different answer then by averaging all the averages. This actually had to be DONE, before the employer would believe the values would be different.

Don't take my word for it - make up an example for yourself. What if one guy always makes his sales, and another never does?

It's not strictly an example of the Paradox, since nothing is reversing direction, but it is an example of how averages are completely meaningless without a sense of the totals involved. It's also what led the student to investigate and find the Paradox, and explains why we have this thing called the weighted mean!

Oh, and actually - it IS in the curriculum. Kind of. We're supposed to learn about the variability inherent in data. So there.

You win on a technicality then... but next time...

No comments:

Post a Comment