Theo.(Math, Philosophy, Entrepreneurship, Green Tech, Machine Learning, Statistics): A Cool Fact about Uniform Distributions

A Cool Fact about Uniform Distributions

On Monday I received the problem of the week from Doubleclick Technical Group guru, Andrew. The problem was as follows:

If I have a hat with all the numbers from 0 to 1 in it, and I start randomly choosing numbers from it...how many times will the number I draw be the maximum value for the numbers I've already drawn when I draw a total of n numbers from the hat? It should be noted that the question is asking: on average how many times will the maximum value change.

I know the answer is obvious to most people, but I had to do a little bit of work to get it. In the end I discovered a cool fact...that Uniform distributions and Beta distributions are related!

First what we really have here is a process that selects a random variable (r.v) from a Uniform(0,1) distribution. A uniform(0,1) distribution has mean .5 and variance .25. Uniform distributions can be used in many cool ways, the least of which is simulating a coin flip.

A depiction of the probability density of a Uniform distribution

Secondly, when we talk about the maximum of a set of r.v, max{X1, X2,....,Xn}, we can also refer to the last element in the sorted, or ordered, set {X(1), X(2), .... , X(n)<----this would be the maximum}. In this case the sort goes in ascending order.

Here is where it gets interesting! (ok interesting if your into this sort of thing, and I am...so....) When we sort the random variables produced from a Uniform distribution, the ordered set takes on the appearance of a Beta distribution! Furthermore, each element in the set has it's own specific Beta distribution....i.e. the k-th element in the set {X(1), ... , X(k), ... , X(n)} has the distribution Beta(k, k+1-n)! Wow!

These are various depictions of Beta probability density functions...neat-o!

Now, we know that the Beta(k, k+1-n) distribution has a mean value of k/(k+k+1-n). This is quite useful, because in our problem above we are asking the question: how many times does the maximum change, or rather, what is the sum of the probabilities for each draw to be a maximum. We can conclude, with some thinking, that on average after n draws the maximum of a set of uniformly drawn r.v.'s is 1 - the mean value of the n-th Uniform r.v. from the ordered set. Thus the Probability that our nth draw is the new maximum is Pr(Xn = max) = 1 - mean(Beta(n, n+1-n)) = 1 - n/(n+1)! It should be noted that I am interchanging the actual value of the n-th draw with it's probability because for the r.v. to be a max it only has to fall within the previous max's value and 1....making the probability of the draw being the max and the mean value of the Beta(n, n+n+1-n) the same thing.

So... if X0, X1, ... , Xn-1 are our draws...we have the following:

Pr(X0 = max) = 1 -0 (first draw must be the maximum)

Pr(X1 = max) = 1- 1/2 (1/(1+1))

Pr(X2 = max) = 1- 2/3 = 1/3

Pr(X3 = max) = 1- 3/4 = 1/4

Pr(Xn-1 = max) = 1-n-1/(n)

And by summing up these probabilities, we get the expected or average amount of times that the number drawn is the maximum...

sum(1-j/(j+1)) [for j=0 to n-1] !

So for n=10,000,000 we expect to obtain about 14 new maximums along the way!

And you thought Math was useless :-) Next week if you're lucky I will discuss how most natural processes can be described by a Normal distribution...including the stock market!

4 comments:

Kent said...: I just want to declare that I got to hear all of this live from the mathematically (and Flash) inclined author himself before this was ever posted.

That's a good thing, because I got lost pretty quick trying to read this. It looks amazingly impressive, though.

On the bright side, I realized that the subtitle for your blog (Math, Philosophy, Entrepreneurship, and Green Technology) could be made into an acronym: "MPEG".

So, my question is whether or not you tried to make a reference to a video and audio encoding standard on purpose, or was that just by chance? Can you plot the probability of this occurring for all blog subtitles?; May 16, 2007 at 2:30 PM
Theo V. said...: :-) MPEG...doh! you cracked my code!

Let's see the probability of getting any meaningful acronym from a blog subtible is... well how about I make another post about it... :-); May 18, 2007 at 9:35 AM
Anonymous said...: Doh! the variance of a uniform distribution on (a,b) is (b-a)^2 / 12. so the variance of uniform(0,1) is 1/12.

or in R: var(runif(1000))

good stuff!; May 18, 2007 at 9:52 AM
Anonymous said...: I'd like to see the derivation of the beta distribution of the maximum of N independent, uniform random variables. Does anyone know where I can find it?

Why? Well, I am looking at the cumulative probability of poker hands which are ~U(0,1) so the winning hand from N players is the maximum of N uniform distributions - so it would have this beta distribution if the players were completely independent. I'm wondering whether I can generalize to cover the correlation caused by dealing from the same pack (in stud poker) or by sharing cards (in texas hold'em).

If you assume the hands are independent, the mean of a Beta distribution is N/N+1 - 2/3 for 2 players (which is a high pair), 3/4 for 3 players (a medium 2-pair), 0.8 for 4 players (a high 2-pair) (in texas hold'em). But the correlation will bring down these expected values.; June 2, 2009 at 10:28 AM