In this chapter, we will briefly study random number generators available in the C++ standard library. Random number generation is a vast and interesting topic in its own right and we are only going to scratch the surface. We are using random number generation as a real world example of the functional programming techniques we discussed in the previous chapter.
First, let’s review how random number generation worked in C. The 15/rand1
program, shown below, uses C functions to generate random integers and
distributes them in the range [-10, 10]:
int main() {
// Random number generator using C functions
srandom(time(nullptr)); // current time as seed
auto engine = &random; // random() in the C library as engine
auto distro = [] (auto some_engine) {
// get an unsigned random integer from engine,
// and "distribute" it in [-10,10]
return some_engine() % 21 - 10;
};
map<int,int> m;
for (int i = 0; i < 1'000'000; ++i) {
int x = distro(engine);
++m[x];
}
for (const auto& [x, frequency] : m) {
cout << setw(4) << x << '|' << frequency << '\n';
}
}
The C function random() returns an integer in the closed range
[0, 2^31 - 1]. It is a pseudo-random number generator, which means that the
numbers that it generates look random, but it is actually a deterministic
sequence of numbers computed by a formula. Given the same initial value, also
known as the seed, random() produces the same exact sequence of numbers. Here,
we follow the common practice of seeding the generator used by random() by
calling srandom() with the current time. time(nullptr) returns the number
seconds since the UNIX Epoch, January 1st, 1970, 00:00 UTC.
The distro functor takes a some_engine functor as an argument, invokes that
engine to produce a random integer, and then maps the integer (roughly)
uniformly into the range [-10, 10] by modding it by 21 and subtracting 10. We
declare engine as a function pointer to the C random() function.
We invoke distro(engine) a million times and construct a frequency map out of
the generated random integers. We then print out the frequency table. Here is a
sample run:
-10|47481
-9|47631
-8|48119
-7|47467
-6|47579
-5|47739
-4|47370
-3|47888
-2|47531
-1|47511
0|47532
1|47672
2|47403
3|47613
4|47773
5|47446
6|47532
7|47426
8|47709
9|47621
10|47957
We’ve written the random number generation using C’s random() in a somewhat
roundabout way – as a composition of a distribution functor and an engine
functor. It’s meant to be a prelude to how the C++ random number generator
library is organized.
Here’s the same program rewritten to use C++ random number generator library:
int main() {
// Random number generator in C++
random_device rd;
random_device::result_type seed;
seed = (rd.entropy() > 0) ? rd() : time(nullptr);
default_random_engine engine { seed };
uniform_int_distribution<> distro { -10, 10 };
map<int,int> m;
for (int i = 0; i < 1'000'000; ++i) {
int x = distro(engine);
++m[x];
}
for (const auto& [x, frequency] : m) {
cout << setw(4) << x << '|' << frequency << '\n';
}
}
We first generate a seed using std::random_device, which is an
implementation-dependent random number generator. If available, it uses a
non-deterministic source to generate random numbers. Otherwise, it generates a
deterministic sequence. In Linux, for example, environmental noise like
keyboard/mouse events or network activity is used for non-determinism.
std::random_device::entropy() determines the degree of non-determinism in the
implementation. If it’s deterministic, it will return zero. We’ll fall back to
seeding with time(nullptr) in that case. Since std::random_device could be
non-deterministic and expensive to compute, the common practice is to use it
only to generate a seed and then switch to a pseudo-random number generator.
As we alluded to in the previous section, the C++ library organizes random number generation into engines and distributions. Different engines implement different psuedo-random number generation algorithms. Different distributions take these engine functors as arguments and map them into normal distribution, uniform distribution, and so on.
Here, we instantiate a default random engine, which is usually the simplest
implementation that the library offers, and a uniform distribution.
std::uniform_int_distribution is templated with an integer type (e.g.,
short, int, long, etc.) and defaulted to int. We omit the type parameter
to use the default. The rest of the program is unchanged from the previous
section.
The following program, 14/rand2, uses the Mersenne Twister engine and a normal
distribution:
int main() {
random_device rd;
random_device::result_type seed;
seed = (rd.entropy() > 0) ? rd() : time(nullptr);
mt19937 engine { seed };
normal_distribution<> distro { 0.0, 3.5 };
map<int,int> m;
int maximum = 0;
for (int i = 0; i < 1'000'000; ++i) {
int x = lround( distro(engine) );
int frequency = ++m[x];
maximum = max(maximum, frequency);
}
int bar_scale = maximum / 50;
for (const auto& [x, frequency] : m) {
cout << setw(4) << x << '|'
<< setw(7) << frequency << '|'
<< string(frequency / bar_scale, '*') << '\n';
}
}
The program also outputs a graph of the resulting normal distribution. We
constructed an instance of std::normal_distribution<> with a mean of 0.0 and
standard deviation of 3.5. The type parameter defaults to double. A sample
run is shown below:
-16| 3|
-15| 15|
-14| 43|
-13| 115|
-12| 349|
-11| 893|
-10| 1999|
-9| 4305|*
-8| 8460|***
-7| 15511|******
-6| 26242|***********
-5| 41097|******************
-4| 59481|**************************
-3| 79300|**********************************
-2| 96586|******************************************
-1| 108721|***********************************************
0| 114142|**************************************************
1| 109138|***********************************************
2| 96977|******************************************
3| 78202|**********************************
4| 59115|*************************
5| 41140|******************
6| 26334|***********
7| 15535|******
8| 8536|***
9| 4348|*
10| 2022|
11| 881|
12| 335|
13| 119|
14| 36|
15| 16|
16| 3|
17| 1|
The std::mt19937 engine functor is a predefined specialization of the
std::mersenne_twister_engine class template defined as follows:
std::mersenne_twister_engine<std::uint_fast32_t,
32, 624, 397, 31,
0x9908b0df, 11,
0xffffffff, 7,
0x9d2c5680, 15,
0xefc60000, 18, 1812433253>
The specialization defines various type and value parameters for the underlying
Mersenne Twister algorithm.
The algorithm gets its name from the fact that its period length is a Mersenne
prime, a prime number that is a power of two minus one. mt19937 is the most
commonly used version of the algorithm, which has the Mersenne prime
2^19937 - 1 as its period length. The period length of a psuedo-random number
generator refers to the length of the sequence it produces before the sequence
repeats itself. The period length 2^19937 - 1 is incredibly large. For
comparison, the Linux manual pages report that the period length of C’s
random() function is approximately 16 * (2^31 - 1), which is already
considered very large. mt19937 is known to be fast, but this comes with a
heavy state storage requirement that we’ll revisit later.
We call distro(engine) a million times again. In this case, the distro
functor invokes the engine functor to generate a random integer, which is then
converted into a floating point number normally distributed around a mean of 0.0
with standard deviation 3.5. We round the resulting floating point number to the
nearest integer using std::lround().
As we build the frequency map, we track the maximum frequency in the map so that we can scale the resulting graph to have its peak be 50 characters long.
Let’s take a quick detour to point out a common pitfall in mixing integers and floating point numbers.
Suppose we forgot to use lround() and wrote the following line instead:
int x = distro(engine);
The code still compiles cleanly with no warnings. It produces the following result:
-17| 1|
-16| 3|
-15| 4|
-14| 33|
-13| 80|
-12| 191|
-11| 541|
-10| 1280|
-9| 2936|
-8| 6199|*
-7| 11617|**
-6| 20161|****
-5| 33110|*******
-4| 49841|***********
-3| 69589|***************
-2| 88048|*******************
-1| 103925|***********************
0| 224758|**************************************************
1| 103651|***********************
2| 87562|*******************
3| 69345|***************
4| 50162|***********
5| 33466|*******
6| 20717|****
7| 11650|**
8| 6102|*
9| 2881|
10| 1299|
11| 534|
12| 216|
13| 71|
14| 21|
15| 5|
17| 1|
Can you see what’s wrong with the resulting graph? Its peak is twice as long as its neighbors; that is, the frequency of 0 is 224758 while the frequencies of 1 and -1 are only half as much!
Using brace-initialization, as shown below, instead of the C-style
initialization using = yields a compilation warning that clues us into what’s
happening:
int x { distro(engine) };
With brace-initialization, the compiler now warns us that we are performing a
narrowing conversion from the double returned by distro(engine) to
int. In other words, the floating point component of the double is discarded
when assigning to the int. Anything in the range (-1, 1) will be truncated
to 0, which is why our peak is twice as large as it should be.
The C-style traditional initialization syntax has always allowed implicit narrowing conversions. Brace-initialization syntax is new in C++11 so the language designers took the opportunity to enforce stricter rules to help prevent data loss bugs, like the one we encountered. The behavior of C-style initialization is left unchanged in C++ for the sake of backward compatibility.
By the same token, the compiler warns us of a narrowing conversion from long
to int had we written the following code instead:
int x { lround(distro(engine)) };
lround() returns a long which may not necessarily fit inside of an int. We
consciously ignored this issue because we know that our distribution will not
return a number that can’t fit inside of an int.
We should always prefer to use brace-initialization to catch bugs like this. We used the C-style initialization to illustrate this point. A conscientious C++ programmer would have written the declaration as follows:
int x { static_cast<int>(lround(distro(engine))) };
Here, we use static_cast<int> to explicitly declare our intent to cast the
long to an int.
In attempt to make the code more readable, we define generator as the
composition of distro and engine inside the for-loop as follows:
for (int i = 0; i < 1'000'000; ++i) {
auto generator = bind(distro, engine);
int x = lround(generator());
int frequency = ++m[x];
maximum = max(maximum, frequency);
}
Not only does the program curiously take much longer to run, but it also produces the wrong output. The output of one run is shown below:
-6|1000000|**************************************************
In fact, every run incorrectly generates the same number a million times. What is going on?
Recall that pseudo-random number generators are, no matter how simple or complicated the underlying algorithms may be, ultimately formulas to produce the next number in the sequence given the current state of the generator. In other words, a pseudo-random number generator is a stateful functor.
The arguments to bind() are copied if they’re l-values and moved if they’re
r-values. This means that when we create generator using bind(), a copy of
the distro functor is composed with a copy of the engine functor. Since we
call bind() inside the loop, we are creating a brand new engine functor
with the same seed at every iteration. This explains why the same number gets
generated a million times. We’ll address why it took so long to run shortly.
An obvious solution is to move the creation of generator out of the loop. But
let’s take this opportunity to explore other solutions.
std::ref()One solution would be to pass engine by reference to bind(), but that seems
impossible given that bind() always creates copies of its arguments. A
workaround is to wrap engine with std::ref().
for (int i = 0; i < 1'000'000; ++i) {
auto generator = bind(distro, std::ref(engine));
int x = lround(generator());
int frequency = ++m[x];
maximum = max(maximum, frequency);
}
The actual implementation of std::ref() and how std::bind() uses it is
somewhat complicated, but we present the following pseudocode that contains the
essence of the implementation:
template <typename T>
class reference_wrapper {
public:
reference_wrapper(T& v) : v_ptr(&v) {}
...
operator T&() const { return *v_ptr; }
private:
T* v_ptr;
};
template <typename T>
reference_wrapper<T> ref(T& v) {
return reference_wrapper<T>(v);
}
std::ref() wraps an object of type T in a std::reference_wrapper<T>, which
holds a pointer to the object. It provides operator T&() so that it can act
like a reference to the underlying object. As you can see in the implementation,
the reference_wrapper is just a pointer so it can be freely copied around.
distro and engine in a lambdaWe now take this opportunity to study lambdas in a bit more detail. Let’s study
each of the following five lambda expressions that are meant to replace the
previous bind() expression inside of the for-loop:
auto generator =
[distro, engine] () { return distro(engine); }; // Lambda (1)
[distro, engine] () mutable { return distro(engine); }; // Lambda (2)
[=] () mutable { return distro(engine); }; // Lambda (3)
[&distro, &engine] () { return distro(engine); }; // Lambda (4)
[&] () { return distro(engine); }; // Lambda (5)
int x = lround( generator() );
Lambda (1) is supposed to be the equivalent of bind(distro, engine), so we
expect it to also erroneously generate the same number repeatedly, but it does
not even compile. The reason is that, by default, the operator() for a lambda
is marked const. That means that distro and engine inside Lambda (1) are
const objects, but the operator() for distro and engine are non-const
because they mutate the objects.
Lambda (2) fixes Lambda (1) by marking the lambda expression as mutable, which
makes its operator() non-const. Lambda (2) is semantically equivalent to
bind(distro, engine) and produce the same errnoneous result.
Lambda (3) is equivalent to Lambda (2). It demonstrates a shorthand syntax indicating that all captures are by value.
Lambda (4) and (5) capture distro and engine by reference, therefore
producing the correct result. Adding the mutable keyword to these lambdas
makes no difference. Without mutable, operator() is marked const, which
means all member variables are const inside of it. But whether a reference
member is const or not doesn’t matter because a reference type variable cannot
be changed anyway once it has been initialized. Note that we’re talking about
whether the reference itself is const or not. We are not talking about the
difference between T& and const T&. The keyword const in const T& says
that you cannot change the object that it refers to. These semantics are
analogous to difference between T* const p and const T* q. The former means
that p cannot be reassigned, but the object it points to can be mutated. The
latter means q can be reassigned, but the object it points to cannot be
mutated.
Let’s go back to our incorrect implementation that constructs a new copy of the
engine functor at every iteration of the loop. When we run it, it not only
produces the wrong result, but it also takes a very long time to run. It took
over five seconds on our system, whereas the correct version ran almost
instantaneously.
Running the following program, 14/rand3, will reveal what’s going on:
int main() {
random_device rd;
random_device::result_type seed;
seed = (rd.entropy() > 0) ? rd() : time(nullptr);
mt19937 engine { seed };
// default_random_engine engine { seed };
cout << "typeid(engine).name():\n" << typeid(engine).name() << '\n';
cout << "\nsizeof(engine): " << sizeof(engine) << '\n';
}
The output of the program is shown below:
typeid(engine).name():
St23mersenne_twister_engineImLm32ELm624ELm397ELm31ELm2567483615ELm11ELm4294967295ELm7ELm2636928640ELm15ELm4022730752ELm18ELm1812433253EE
sizeof(engine): 5000
The typeid() operator returns the std::type_info object associated with the
type of the given expression. We previously encountered this object when we
studied how std::dynamic_cast recovers type information to determine whether a
cast is valid or not. Here, we print the name() of the type, which returns the
implementation-defined name of the type. We also print out the size of the
object.
The output confirms that mt19937 is indeed a specialization of the
mersenne_twister_engine class template and we can even see the numerous value
parameters embedded in the mangled name. The output also reveals that the size
of the engine is 5000 bytes! It took so long to run because we were copying this
large object a million times. Some functors are very stateful and we should be
careful copying them around.
If we switch mt19937 with default_random_engine, we get the following
output:
typeid(engine).name():
St26linear_congruential_engineImLm16807ELm0ELm2147483647EE
sizeof(engine): 8
It appears that default_random_engine is a type alias for a specialization of
the std::linear_congruential_engine class template, which is defined as follows:
template<
class UIntType,
UIntType a,
UIntType c,
UIntType m
> class linear_congruential_engine;
From the mangled name, we can deduce that a == 16807, c == 0, and
m == 2147483647. The linear congruential random number generator is the
simplest algorithm that just keeps applying the following formula:
x = (a * x + c) % m;
Since a, c, and m are compile-time template value parameters, the only
state that the engine needs to keep is the current value of x, hence the
object size of 8 bytes.
Last updated: 2025-11-11