May 13, 2007

Do I Underrate How Surprising or Groundbreaking This Is?

Benford's Law

"In many data series a surprising number of entries begin with the number 1, and the number 2 is also more common than a random distribution might suggest." --Tyler Cowen

You might not have thought of this (I hadn't) but in hindsight it makes total sense. Pick a random integer and see what happens with the digits of the numbers between 1 and that number. For example, in the tens digit you see a 1 ten times before you see other number, a 2 ten times before [etc.].

The interesting question for any given set of data is whether the law is applicable. It probably is, but think about conditions (upper or lower bounds, whether the data in question is better thought of as a sequence of digits than as an n-digit number) that would make it inapplicable.

I hope everyone realizes that the second parenthetical reason above is why lottery numbers decidedly do not follow Benford's law.

Even if there was a lottery setup that sort of applied to Benford's law -- e.g, if a lotto drew just one ping-pong ball from balls numbered sequentially from 1 to some three-digit number and thus the first digit was more likely to be 1 than any other number -- this would tell you absolutely nothing about whether 101 was more likely to appear than 102, 102 than 103, etc.

While we're here: Take a lottery that involves drawing six balls (without replacing) numbered from 1 to 99. Assume the balls have been properly randomly shuffled. THINK QUICK: Which outcome is more likely, 1-2-3-4-5-6 or 48-40-1-79-72-93? (If you get this wrong I will sigh sadly and die a little inside.)

This is all very interesting (I claim) and very easy to understand if explained correctly but it doesn't get nearly enough coverage. I'm embarrassed to admit that until the Marginal Revolution entry I'd never heard of Benford's Law, nor intuited the same result.

Posted by Matt Bruce at May 13, 2007 01:46 PM
What Other People Say

Maybe it's the mathie in me, but I got in on this one years ago. I dunno -- I certainly couldn't come up with the distribution on my own, and it's not as intuitive as I'd like it to be, but I'm tryin', Ringo, I'm tryin'.

Want some more fun w/ your THINK QUICK concept? Assuming lotteries are all run the same way, not only is 1-2-3-4-5-6 as likely as any other combination in a given lottery, but theoretically (if not empirically) it's THE MOST COMMON COMBINATION in lottery history, since it's an option in EVERY LOTTERY.

Although that kinda assumes that there's a lottery that only has six balls and pulls six balls, otherwise there are other combinations w/ equal omnipresence. But this works well in Mathland...

Posted by: ZD at May 13, 2007 04:29 PM

I don't follow your explanation of the intuition.

The first time I heard about this was when I was in college and a physics prof remarked that if you look at the numbers in a table of physical constants, no matter what units are used, you'll find that a lot more than 1/9 of them have initial digit 1. Since I was used to looking at logarithmic data plots, where you see that numbers beginning with the digit 1 take up fraction log10(2) of the area, I could intuit right away that this should be the same fraction of physical constants, even though I'd never thought of it before.

So I've always thought of Benford's Law as the best example of a quiz bowl question that players might be able to figure out on the spot, and learn something interesting, even if they didn't actually know it before, or didn't know that they knew it. I wrote it into the final of the Beaver Bonspiel in 1995. Maybe it would have worked better as a bonus than as a tossup. But I think the very worst way to ask about this concept is to state it and ask for its "name", as somebody else did a few years later.

Posted by: west coast dork at May 14, 2007 08:51 PM

This isn't related to your comment, but on reading over that old packet: I remember "Trespassers W". Nifty tossup.

The "intuitive" understanding I developed results from noticing a pattern like this:

Distribution of leading digits of the first 9 integers:
1 -> 1
2 -> 1
3 -> 1
[...]
9 -> 1

Distribution of the leading digits of the first 19 integers:
1 -> 11
2 -> 1
3 -> 1
[...]

Distribution of the leading digits of the first 29 integers:
1 -> 11
2 -> 11
3 -> 1
4 -> 1
[...]

My intuition relates to sequential data rather than physical constants of indefinite (not necessarily bounded) range. On the other hand, a finite set of physical constants will have a maximum and a minimum, where the latter is likely to be negative or close to zero.

Posted by: me at May 15, 2007 04:51 PM
Talk At Me









Remember personal info?