This relates to the distribution of LEADING DIGITS in a collection of numbers.
Basically, it turns out the digit "1" turns up nearly a third of the time, under the right circumstances.
It's all explained very well by Steve Mould here in the Numberphile video.
Of course for even more detail, there's always Wikipedia.
Taking things a bit further I decided to analyse Numberphile's YouTube figures - see if they obeyed Benford's Law?
(Well actually it was done for me by a chap named Daniel who is far better with spreadsheets!)
Here's the resulting "extra" film, also posted on Numberphile:
And here are the Numberphile viewing figures, charting the videos' durations (in seconds), views and number of comments.
Now, just as a reminder, below is a perfect Benford distribution - the curve is quite obvious!
And to make thing more visible, here are the Numberphile video durations (in seconds) distributed by leading digit:
Not very Benford-esque are they? More on this later.
Here are the viewing counts (again, remember this only relates to the distribution of leading digits):
Closer. The 1s are certainly dominant.
Now here are the viewer comments:
Not quite right, is it?
But perhaps the sample size is too small, with just over 100 Numberphile videos.
So next I sent Daniel the stats for ALL MY CHANNELS:
And again, you'll see below that the video durations do not give a Benford curve:
But take a look at the graph we get from the leading digits on view counts:
Much better. And the same applies to the number of comments on videos:
I guess the big question is, why does it not apply to video durations?
I'd love to hear people's explanations.
Here's the one Daniel put forward, which makes sense to me.
Essentially, the durations are "planned" (by me). Videos that are too long may be unpalatable to viewers.
For example, few films will be longer than 10 minutes - a psychological barrier because of the double digit.
(In fact, for quite some time I was not able to post videos longer than 10 minutes... and of all my videos just 5% exceed 10 minutes)
So for videos longer than 10 minutes and using seconds as our unit of measurement, the durations (and leading digits) can be grouped like this:
Leading Digit 1 - Applies to films of duration 1 second, 10-19 secs and 1:40-3:19 (111 contributing values, 18.5%)
Leading Digit 2 - 2 seconds, 20-29 secs and 3:20-4:59 (111 values, 18.5%)
Leading Digit 3 - 3 seconds, 30-39 secs and 5:00-6:39 (111 values, 18.5%)
Leading Digit 4 - 4 seconds, 40-49 secs and 6:40-8:19 (111 values, 18.5%)
Leading Digit 5 - 5 seconds, 50-59 secs and 8:20-9:59 (111 values, 18.5%)
Leading Digit 6 - 6 secs and 1:00-1:09 (11 values, 1.84%)
Leading Digit 7 - 7 secs and 1:10-1:19 (11 values, 1.84%)
Leading Digit 8 - 8 secs and 1:20-1:29 (11 values, 1.84%)
Leading Digit 9 - 9 secs and 1:30-1:39 (11 values, 1.84%)
So you can see most of the films (when only including those under 10 minutes) will fall into the 1-5 groupings.
And the duration graphs from my videos back this up, with a stronger distribution in the 1-5 slots, and then dropping off in the 6-9 slots.