You can observe a trend in two different groups of data, but when you combine both those groups of data - the trend disappears

It’s a statistical phenomenon called - Simpson’s Paradox

It’s a paradox because the eventual statistical result goes against your intuition

Think about it this way :

If you and your friend Penny decide to solve a bunch of math puzzles, once on Friday night, and another time on Saturday night - although, why in the world you would choose to solve math puzzles on a Friday night is beyond me

That Bourbon isn’t gonna drink itself !

Fine, let’s just proceed with this example

On Friday, you solved 15/20 puzzles correctly, and she solved 4/5 correctly. You got 75% correct and she got 80% correct

On Saturday, you solved 2/5 puzzles correctly, and she solved 12/20 correctly. You got 40% correct and she got 60% correct

She obviously outscored you on both days and is better than you at math. But - why don’t we combine both days and see what happens ?

Penny : (4/5) + (12/20) = 16/25 = 64%

You : (15/20) + (2/5) = 17/25 = **68%**

And now, you actually have a better average overall, when compared to Penny

Sometimes, in statistics, when you combine two groups of data, the trend you observe in each separate group disappears, which is why we call this a mathematical paradox

**Why does this happen ?**

This happens because of unequal sample sizes. So, your final result (68% for you versus 64% for Penny) is predominantly driven by the day, whenever either of you solved 20 puzzles, as opposed to five puzzles

When you and Penny solved 20 questions, you actually solved more of them correctly - you solved 15/20 correct, as opposed to Penny who only solved 12/20 correct. So, this in large part drove your aggregated statistic of - combined average (using both days)

The day when you solved 75% correct (15/20), Penny outmatched you by solving 80% correct, but she only attempted to solve five questions, and hence, not considering the total number of questions attempted when computing the single day average, misrepresents the problem

It’s the curse of using averages, or rather, simple thoughtless averages to make decisions. We observe Simpson’s paradox in sports, medicine, biostatistics and machine learning applications all the time

**A Real World Example in Baseball **

One of the most famous examples of this - is the batting average comparison between Derek Jeter and David Justice. Between 1995 and 1997, David Justice had better batting averages when compared to Jeter, for all of those years

But, when you combine all the three years, Jeter had a slightly better overall batting average (300 for Jeter versus 298 for Justice)

This is because of the unequal sizes of - at bats. Jeter was at bat 238 more times than Justice, (1284 for Jeter versus 1046 for Justice), and if we don’t account for that, we would end up short changing Jeter

And yes - those Yankees pinstripes are legitimately gorgeous !

awesome example and explanation...yeah sometimes people misinterpret the same paradox as gender disparity in the college admissions . excellent example Deepak. thanks .