 # Simple Random Numbers

by Matthias Riegel

Many problems can be solved by using random numbers:

• Negotiate channel use in network protocols 1
• Generate unique identifiers on independent machines (e.g. device ids, type 4 UUIDs, ...)
• Initialization and events in Simulations and Games
• Secrets and Nonces for Cryptographic software

but deterministic machines are really bad at producing non deterministic (random) results.

In this article, I will describe some basic algorithms, how they need to be initialized and how to use them to produce numbers that are "random" enough for non security applications.

In a second article (Random 2) I will give a short overview of how cryptographically secure random number systems work.

The text below mostly ignores hardware random number generators. If you have one that produces numbers fast enough and you trust that it was implemented without bugs (or backdoors) and that it will never break, there is no need for any of the algorithms or entropy gathering described below.

The given examples are python based pseudo code, because I think it's the most readable, not because it's suitable for embedded systems.

## When is a Number Random? https://xkcd.com/221/

Wether true randomness exists is a rather philosophical question. Depending on the application, there are different aspects of randomness that matter, for example that:

• Numbers are distributed evenly along a given range
• Numbers generated on different machines do not (easily) collide
• Consecutive numbers are independent
• The next generated number can not be influenced
• The next generated number can not be predicted

The last aspect is a superset of the above since every bit of knowledge about the distribution or dependencies among the numbers or ways to influence the next, helps to predict the next generated number.

There are tests that detect if numbers in a series are somehow related to each other. They usually combine several numbers to objects and then analyze whether large sets of these objects have expected statistical attributes. The commonly cited test suite is TestU01 2

## Some Algorithms

An algorithm can not be random, but it can generate a sequence of numbers which may seem random to an outsider, depending on their knowledge about the internal state. This set of algorithms is therefore called Pseudo Random Number Generator (PRNG). There is a huge amount of different PRNGs. I will describe a few ones, that I found easy to understand or are widely used.

### A Simple Algorithm: The Linear Congruential Generator

The Linear Congruential Generator (LCG) is very simple to implement:

``````def lcg():
state = (state * a + b) mod m
return state
``````

Before calling `lcg` the first time, a value has to be assigned to `state`. This value is called `seed` since it is the base for all following numbers. If the constants `a`, `b` and `m` are not chosen correctly, the numbers can repeat themselves after a very short time, but if they follow some rules3, all values between `0` and `m-1` will be reached, giving a period length of `m`. The values used by glibc for `a`, `b` and `m` are: 1103515245, 12345 and 0x800000004 with a state of 32 bit.

For applications like games (on the level of Tetris) or fuzzy tests (start 10 tasks, let each one repeatedly sleep a random time and see if the scheduler does its job), the LCG is good enough. Since it is also extremely fast, it is used as the default random number function in many libraries (glibc, java, Delphi, ...). The problem with the LCG is, that it is has various weaknesses and fails almost all randomness tests. One weakness, that can easily be demonstrated, is that "randomly" chosen points are not evenly distributed, but lie on relatively few hyperplanes (on lines when choosing 2D points, on planes when choosing 3D points, ...).

The following python code generates 2D points using an LCG with an intentionally small state of one byte, `a` and `b` are chosen to give all number from 0 to 255 once before repeating.

``````a = 5
b = 3
m = 256
state = 1
def lcg():
global state
state = (state * a + b) % m
return state

for i in range(m//2):
x = lcg()
y = lcg()
print ("%d\t%d" % (x, y))
``````

The last line does format-string expansion. The first lines of output are:

``````    8   43
218 69
92  207
14  73
112 51
2   13
68  87
182 145
216 59
42  213
``````

Plotting only these 10 points, they look somewhat random, but with 50 or all the points, it is apparent, that the points are not spread evenly: With this configuration, the LCG should not even be used to implement Minesweeper.

For comparison, choosing the same number of points with one of the algorithms below gives a properly random looking distribution: ### Better Algorithms: Mersenne Twister and KISS

The Mersenne Twister is the default PRNG for many modern languages including Python, Ruby and Matlab. The code is a little longer than the LCG's and can be found as on wikipedia5. In the common MT19937 variant, it has a period of `2^19937-1` and passes most randomness tests but fails the one for linear independence. It has a large state of 624 four byte words, each of which is used as one random number. On every 624th invocation, the whole set is regenerated. This makes the large MT unsuitable for embedded use.

MT's predecessor, the TT800, works after the same principles with only 25 four byte words of state but it fails at various randomness tests. A version of the MT with smaller state, TinyMT6, claims to be quite good with only 16 bytes of state. The last two do not seem to be widely used or thoroughly analyzed.

The KISS algorithm simply adds the results of three weak PRNGS: LCG, xorshift and MWC.

``````def uint32(i):
return i & 0xFFFFFFFF

def kiss():
# LCG:
x = uint32( 69069 * x + 12345 )

# Xorshift
y ^= uint32(y << 13)
y ^= uint32(y >> 17)
y ^= uint32(y << 5)

# Multiply-with-carry
t = 698769069 * z + c;
c = uint32(t >> 32)
z = uint32(t)

# Combining all 3
return uint32(x + y + z)
``````

If the 4 state variables (`x`, `y`,`z` and `c`) are seeded with non zero values, the number stream generated with `kiss` passes all tests of the TestU01 suite and has a period of approximately `2**119`7. The implementation is easy and it executes quickly.

### Good Algorithms Using Cryptographic Primitives

Cryptographic primitives like hash functions or ciphers can be used as PRNGs. They are most likely not worth the effort on an embedded system, if they are not used in a cryptographic context, but they are extremely well analyzed, and produce excellent random number sequences.

Cryptographic hash8 functions have the property, that each output bit depends on every input bit and that the input can not be reconstructed from the output (because the output seems random). They thus are an obvious choice for PRNGs. Using the 256 bit version of SHA-2 gives:

``````def hash_feedback():
state = sha256(state)
return state
``````

The size of `state` is in this case 256 bit. The length of the period is well below `2**256` but still extremely large.

In a different mode of operation, the hash function is used on a counter:

``````def hash_counter():
state = state + 1
return sha256(state)
``````

In this case, the seed is the initial value of the counter. The size of `state` can be chosen arbitrarily large. The length of the period in this variant is exactly the number of possible values of `state`.

Cryptographic ciphers turn a value into another one that looks random. They can therefore also be used to produce random numbers:

``````def cipher_feedback():
state = aes256(key, state)
return state
``````

The state can be chosen arbitrarily long (in multiples of the ciphers block size). The initial values of `state` and `key` both make up the seed.

Ciphers can also be used in counter mode:

``````def cipher_counter():
state = state + 1;
return aes256(key, state)
``````

The size of the state can then only be as large as one block.

Whether ciphers or hashes are used, and in which mode has little impact on the statistical quality of the random number sequence, but has a huge impact on the predictability. This will be covered in the next article.

## How PRNGs are Initialized

To initialize PRNGs, we still need a seed value that is "random". But only a few bytes.

If your device has some kind of non-volatile memory, the current state of the PRNG should be stored there. In the device's life cycle there will only be a need for one seed value which can be generated externally. Without such storage, the seed has to be generated after every reset.

For simple applications, the current time is a good enough seed, but if the embedded system does not have a real time clock, the only available time is that since reset. The seed is usually chosen during application startup so the time will fall into a small range and all devices will a generate one of a few different sequences of random numbers after every rest.

Other sources of seed material can be:

• Communication Protocols (Ethernet, Serial, I²C, ...)
• Microphones
• Camera
• Remainder of Timers
• Timings of Interrupts
• HWRNG
• Data unique to each CPU like an ID or calibration data for ADC and Clock

To collect random data from several sources, the "randomness" should be spread across all bits of the seed by passing it through a hash function.

## Using PRNGs

Few applications need exactly the uniformly distributed numbers that a PRNG produces. Common operations include limiting them in a certain range, generating floats or generating numbers with a certain distribution9.

### Generating Numbers in a Range

When scaling the value into the desired range, it is easy to produce biased results. The simplest way is using modulo to make them fit in the interval `[0..range-1]`:

``````def rand_range_naive(range):
x = prng()
return x % range
``````

But this will return small numbers more often than larger ones. If the PRNG returns 1 byte numbers (`0..255`) and range is `100`, the numbers `0..55` will each have a 50% higher chance than the numbers `56..99`. `0..99` are not modified, `100..199` are mapped onto `0..99` but `200..255` are mapped onto `0..55`.

Assuming that the PRNG's result spaces has some power of two as size, the above function produces fair results for ranges that are also powers of two. With a range of `128`, the values `0..127` would be valid and the values `128..255` would be mapped onto `0..127` so that each value from zero to 127 has the same chance.

An algorithm that returns each number with the same probability for arbitrary ranges scales the random number to the next higher power of two range and then discards invalid numbers:

``````def rand_range(range):
i = 1
while i < range:
i = i * 2
while True:
x = prng()
x = x % i
if x < range:
return x
``````

For the worst possible range (one more than a power of 2), the algorithm's average runtime is no more than `rand_range_naive`, but has no upper bound. On real time applications zou should use ranges that are a power of two or live with slightly biased results.

### Generating random Floats

Floating point values between zero and one are simply generated with:

``````def rand_float():
return prng() / MAX_PRNG_VALUE
``````

## Conclusion

For simple applications, the LCG, seeded with the current time, is probably good enough. For more advanced applications like generating unique identifiers or simulations, the KISS algorithm can be used but must be carefully seeded. Mistakes at using the random values can easily be avoided by looking at the way, other libraries do it.

What entropy sources does your Embedded System have? What do you need random numbers for?

1. On collision, wait a random time, then try again: https://en.wikipedia.org/wiki/Exponential_backoff ↩︎

2. This topic is covered thoroughly in The Art of Computer Programming, Volume 2, by Donald Knuth ↩︎

3. More about hash functions in a future article on this blog ↩︎

4. For a good read about distributions, I recommend the lower half of Python's random module: https://hg.python.org/cpython/file/3.5/Lib/random.py ↩︎

Go back

## Arrange an expert meeting

We offer a free and non-committal interview with one of our experts. We can get to know each other, answer your open questions and also discuss the first requirements of your project.