Gaussian random number generator using box-muller method

17 Aug 2021 - tsp
Last update 17 Aug 2021
Reading time 5 mins

This blog post is a short summary on how one can use the Box-Muller transform to generate Gaussian distributed random numbers from an equal distribution of random numbers in the range $(0, 1)$. This range is usually available in an easy way on most computers - even hardware generators that exploit Schottky noise or some quantum mechanical effects as well as pseudo random number generators are usually outputting equally distributed numbers. This is usually required for many purposes (including cryptography) but sometimes it’s necessary to have numbers distributed according to different distributions - for example when performing Monte-Carlo simulations. In this case one can simply transform the random numbers to normal distributed values by using the Box-Muller transformation

How the transform works

Basically the transform is pretty simple. In it’s most basic form one has to choose two statistically independent random numbers $a$ and $b$ from the unit interval $(0,1)$. Then one can directly calculate the two independent standard normal distributed random quantities:

[ R_0 = \sqrt{-2 * ln(a)} * \cos(2 \pi b) \\ R_1 = \sqrt{-2 * ln(a)} * \sin(2 \pi b) ]

As one can see this arises from the idea that the two random variables are used to define the polar coordinates of the random number ($R^2 = -2 * ln(a)$ and $\theta = 2 \pi b$).

The idea arises from some basic concepts:

[ X \approx \mathfrak{No(0,1)} = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} x^2} \\ Y \approx \mathfrak{No(0,1)} = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} x^2} ] [ f(x ; \lambda) \approx \lambda e^{-\lambda x} \forall x \geq 0 \\ f(x ; \lambda) = 0 \forall x < 0 ]

Different mean or standard deviation

As one can see generating Gaussian distributed random numbers is rather easy (in case one can afford the squareroot, exponential and power operation) - but what if one wants to have a different mean $\mu$ or standard deviation $\sigma$? Well that’s simple too. One can recall the standard score / z score from statistics:

[ z = \frac{x - \mu}{\sigma} ]

This is usually used to transform data points from arbitrary Gaussians into a space with $\mu = 0$ and $\sigma = 1$. This can of couse be inverted to transform a standardized $z$ value back into a space with arbitrary $(\mu,\sigma)$ which is sufficient to produce random numbers for any $x \approx \mathfrak{No(\mu,\sigma)}$

[ x = z * \sigma + \mu ]

Proof

Assume we have to uniformly distributed random numbers $U$ and $V$. The quantities $X$ and $Y$ are defined as above:

[ X = \sqrt{-2 log(U)} \cos(2 \pi V) \\ Y = \sqrt{-2 log(U)} \sin(2 \pi V) ]

First let’s look at the term $\sqrt{-2 log(U)}$:

[ R = \sqrt{2 log(U)} \\ P(R \leq r) = P(-2 log(u) \leq r^2) \\ = P(log(U) \geq -\frac{r^2}{2} ) \\ = P(U \geq e^{-\frac{r^2}{2}}) \\ = 1 - P(U < e^{-\frac{r^2}{2}}) ]

Using the fact that $U$ is uniformly distributed:

[ P(R \leq r) = 1 - \int_{0}^{e^{-\frac{r^2}{2}}} 1 dt = 1 - e^{-\frac{r^2}{2}} \\ p_{density}(t) = t * e^{-\frac{t^2}{2}} ]

Now the second part $\theta = 2 \pi V$ can be looked at the same way:

[ \theta = 2 \pi V \\ P(\theta \leq r) = P(2 \pi v \leq r) \\ = P(v \leq \frac{r}{2 \pi}) \\ \to P(\theta leq r) = \int_0^{\frac{r}{2\pi}} 1 dt = \frac{r}{2\pi} \\ \to p_{density}(t) = \frac{1}{2\pi} ]

Assuming that $U$ and $V$ are independent one can also assume both of the density functions of $R$ and $\theta$ are independent and thus their product forms the sum density function:

[ p_{r;\theta}(t_1, t_2) = p_r(t_1) * p_{\theta}(t_2) = \frac{1}{2\pi} * t_1 * e^{-\frac{t_1^2}{2}} ]

Example

How can one implement a simple generator that outputs values distributed as $X \approx \mathfrak{No(0,1)}$ in ANSI C? Simply use the formula above and the standard C rand() function.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double randGaussian() {
	double a = ((double)(rand()))/((double)RAND_MAX);
	double b = ((double)(rand()))/((double)RAND_MAX);

	double R0 = sqrt(-2.0 * log(a)) * cos(2 * M_PI * b);
	/*
		double R1 = sqrt(-2.0 * log(a)) * sin(2 * M_PI * b);
	*/

	return R0;
}

Keep in mind that rand() already is a pseudo random generator so it has to be correctly seeded using srand() from a real random source or it will reproduce the same sequence all over again (which might be interesting for debugging purposes anyways).

To get an feeling how the random numbers are distributed one can generate a huge number of those and use gnuplot to plot the values using a script like the following:

gnuplot> binwidth=0.05
gnuplot> bin(x,width) = width * floor(x/width)
gnuplot> plot "boxmuller.dat" using(bin($1, binwidth)):(1.0) smooth freq with boxes title "Values"

Starting with 100 values it doesn’t look like a typical Gaussian curve:

100 values via box mueller

A typical histogram might look like the following for 10.000 values:

10.000 values via box mueller

For 1.000.000 values the function even looks more familiar:

1.000.000 values via box mueller

This article is tagged: Math, ANSI C, Statistics


Data protection policy

Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)

This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/

Valid HTML 4.01 Strict Powered by FreeBSD IPv6 support