Gaussian random number generator using box-muller method
17 Aug 2021 - tsp
Last update 17 Aug 2021
5 mins
This blog post is a short summary on how one can use the Box-Muller transform
to generate Gaussian distributed random
numbers from an equal distribution of random numbers in the range $(0, 1)$. This
range is usually available in an easy way on most computers - even hardware
generators that exploit Schottky noise or some quantum mechanical effects as well
as pseudo random number generators are usually outputting equally distributed numbers.
This is usually required for many purposes (including cryptography) but sometimes itโs
necessary to have numbers distributed according to different distributions - for
example when performing Monte-Carlo simulations. In this case one can simply
transform the random numbers to normal distributed values by using the Box-Muller
transformation
Basically the transform is pretty simple. In itโs most basic form one has
to choose two statistically independent random numbers $a$ and $b$ from the
unit interval $(0,1)$. Then one can directly calculate the two independent
standard normal distributed random quantities:
[
R_0 = \sqrt{-2 * ln(a)} * \cos(2 \pi b) \\
R_1 = \sqrt{-2 * ln(a)} * \sin(2 \pi b)
]
As one can see this arises from the idea that the two random variables are used
to define the polar coordinates of the random number ($R^2 = -2 * ln(a)$ and $\theta = 2 \pi b$).
The idea arises from some basic concepts:
- If one assumes that the $X$ and $Y$ coordinate in a Euclidean coordinate system
are distributed according to the standard normal distribution then the
pair $(X,Y)$ is distributed according to the bivariate normal distribution.
[
X \approx \mathfrak{No(0,1)} = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} x^2} \\
Y \approx \mathfrak{No(0,1)} = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} x^2}
]
- The square norm of a bivariate normal distribution yields a $\chi^2$ distribution
with two degrees of freedom which is - with these two degrees of freedom - the
same as the exponential distribution. This yields the required exponential term
to produce the normal distribution.
[
f(x ; \lambda) \approx \lambda e^{-\lambda x} \forall x \geq 0 \\
f(x ; \lambda) = 0 \forall x < 0
]
Different mean or standard deviation
As one can see generating Gaussian distributed random numbers is rather easy (in
case one can afford the squareroot, exponential and power operation) - but what
if one wants to have a different mean $\mu$ or standard deviation $\sigma$? Well
thatโs simple too. One can recall the standard score / z score from statistics:
[
z = \frac{x - \mu}{\sigma}
]
This is usually used to transform data points from arbitrary Gaussians into
a space with $\mu = 0$ and $\sigma = 1$. This can of couse be inverted to
transform a standardized $z$ value back into a space with arbitrary $(\mu,\sigma)$
which is sufficient to produce random numbers for any $x \approx \mathfrak{No(\mu,\sigma)}$
[
x = z * \sigma + \mu
]
Proof
Assume we have to uniformly distributed random numbers $U$ and $V$. The
quantities $X$ and $Y$ are defined as above:
[
X = \sqrt{-2 log(U)} \cos(2 \pi V) \\
Y = \sqrt{-2 log(U)} \sin(2 \pi V)
]
First letโs look at the term $\sqrt{-2 log(U)}$:
[
R = \sqrt{2 log(U)} \\
P(R \leq r) = P(-2 log(u) \leq r^2) \\
= P(log(U) \geq -\frac{r^2}{2} ) \\
= P(U \geq e^{-\frac{r^2}{2}}) \\
= 1 - P(U < e^{-\frac{r^2}{2}})
]
Using the fact that $U$ is uniformly distributed:
[
P(R \leq r) = 1 - \int_{0}^{e^{-\frac{r^2}{2}}} 1 dt = 1 - e^{-\frac{r^2}{2}} \\
p_{density}(t) = t * e^{-\frac{t^2}{2}}
]
Now the second part $\theta = 2 \pi V$ can be looked at the same way:
[
\theta = 2 \pi V \\
P(\theta \leq r) = P(2 \pi v \leq r) \\
= P(v \leq \frac{r}{2 \pi}) \\
\to P(\theta leq r) = \int_0^{\frac{r}{2\pi}} 1 dt = \frac{r}{2\pi} \\
\to p_{density}(t) = \frac{1}{2\pi}
]
Assuming that $U$ and $V$ are independent one can also assume both of the
density functions of $R$ and $\theta$ are independent and thus their product
forms the sum density function:
[
p_{r;\theta}(t_1, t_2) = p_r(t_1) * p_{\theta}(t_2) = \frac{1}{2\pi} * t_1 * e^{-\frac{t_1^2}{2}}
]
Example
How can one implement a simple generator that outputs values distributed
as $X \approx \mathfrak{No(0,1)}$ in ANSI C? Simply use the formula above
and the standard C rand()
function.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double randGaussian() {
double a = ((double)(rand()))/((double)RAND_MAX);
double b = ((double)(rand()))/((double)RAND_MAX);
double R0 = sqrt(-2.0 * log(a)) * cos(2 * M_PI * b);
/*
double R1 = sqrt(-2.0 * log(a)) * sin(2 * M_PI * b);
*/
return R0;
}
Keep in mind that rand()
already is a pseudo random generator so it has to
be correctly seeded using srand()
from a real random source or it will
reproduce the same sequence all over again (which might be interesting for debugging
purposes anyways).
To get an feeling how the random numbers are distributed one can generate a huge
number of those and use gnuplot
to plot the values using a script like the
following:
gnuplot> binwidth=0.05
gnuplot> bin(x,width) = width * floor(x/width)
gnuplot> plot "boxmuller.dat" using(bin($1, binwidth)):(1.0) smooth freq with boxes title "Values"
Starting with 100 values it doesnโt look like a typical Gaussian curve:

A typical histogram might look like the following for 10.000 values:

For 1.000.000 values the function even looks more familiar:

This article is tagged: Math, ANSI C, Statistics