Simple Random Numbers in Embedded Systems

by Matthias Riegel (comments: 0)

Many problems can be solved by using random numbers:

  • Negotiate channel use in network protocols 1
  • Generate unique identifiers on independent machines (e.g. device ids, type 4 UUIDs, ...)
  • Initialization and events in Simulations and Games
  • Secrets and Nonces for Cryptographic software

but deterministic machines are really bad at producing non deterministic (random) results.

In this article, I will describe some basic algorithms, how they need to be initialized and how to use them to produce numbers that are "random" enough for non security applications.

In a second article (Random 2) I will give a short overview of how cryptographically secure random number systems work.

The text below mostly ignores hardware random number generators. If you have one that produces numbers fast enough and you trust that it was implemented without bugs (or backdoors) and that it will never break, there is no need for any of the algorithms or entropy gathering described below.

The given examples are python based pseudo code, because I think it's the most readable, not because it's suitable for embedded systems.

When is a Number Random?

Random Number
https://xkcd.com/221

Wether true randomness exists is a rather philosophical question. Depending on the application, there are different aspects of randomness that matter, for example that:

  • Numbers are distributed evenly along a given range
  • Numbers generated on different machines do not (easily) collide
  • Consecutive numbers are independent
  • The next generated number can not be influenced
  • The next generated number can not be predicted

The last aspect is a superset of the above since every bit of knowledge about the distribution or dependencies among the numbers or ways to influence the next, helps to predict the next generated number.

There are tests that detect if numbers in a series are somehow related to each other. They usually combine several numbers to objects and then analyze whether large sets of these objects have expected statistical attributes. The commonly cited test suite is TestU01 2

Some Algorithms

An algorithm can not be random, but it can generate a sequence of numbers which may seem random to an outsider, depending on their knowledge about the internal state. This set of algorithms is therefore called Pseudo Random Number Generator (PRNG). There is a huge amount of different PRNGs. I will describe a few ones, that I found easy to understand or are widely used.

A Simple Algorithm: The Linear Congruential Generator

The Linear Congruential Generator (LCG) is very simple to implement:

def lcg():
    state = (state * a + b) mod m
    return state

Before calling lcg the first time, a value has to be assigned to state. This value is called seed since it is the base for all following numbers. If the constants a, b and m are not chosen correctly, the numbers can repeat themselves after a very short time, but if they follow some rules3, all values between 0 and m-1 will be reached, giving a period length of m. The values used by glibc for a, b and m are: 1103515245, 12345 and 0x800000004 with a state of 32 bit.

For applications like games (on the level of Tetris) or fuzzy tests (start 10 tasks, let each one repeatedly sleep a random time and see if the scheduler does its job), the LCG is good enough. Since it is also extremely fast, it is used as the default random number function in many libraries (glibc, java, Delphi, ...). The problem with the LCG is, that it is has various weaknesses and fails almost all randomness tests. One weakness, that can easily be demonstrated, is that "randomly" chosen points are not evenly distributed, but lie on relatively few hyperplanes (on lines when choosing 2D points, on planes when choosing 3D points, ...).

The following python code generates 2D points using an LCG with an intentionally small state of one byte, a and b are chosen to give all number from 0 to 255 once before repeating.

a = 5
b = 3
m = 256
state = 1
def lcg():
    global state
    state = (state * a + b) % m
    return state

for i in range(m//2):
    x = lcg()
    y = lcg()
    print ("%d\t%d" % (x, y))

The last line does format-string expansion. The first lines of output are:

    8   43
    218 69
    92  207
    14  73
    112 51
    2   13
    68  87
    182 145
    216 59
    42  213

Plotting only these 10 points, they look somewhat random, but with 50 or all the points, it is apparent, that the points are not spread evenly: Hyperplanes with the LCG With this configuration, the LCG should not even be used to implement Minesweeper.

For comparison, choosing the same number of points with one of the algorithms below gives a properly random looking distribution:

No Hyperplanes with proper random numbers

Better Algorithms: Mersenne Twister and KISS

The Mersenne Twister is the default PRNG for many modern languages including Python, Ruby and Matlab. The code is a little longer than the LCG's and can be found as on wikipedia5. In the common MT19937 variant, it has a period of 2^19937-1 and passes most randomness tests but fails the one for linear independence. It has a large state of 624 four byte words, each of which is used as one random number. On every 624th invocation, the whole set is regenerated. This makes the large MT unsuitable for embedded use.

MT's predecessor, the TT800, works after the same principles with only 25 four byte words of state but it fails at various randomness tests. A version of the MT with smaller state, TinyMT6, claims to be quite good with only 16 bytes of state. The last two do not seem to be widely used or thoroughly analyzed.

The KISS algorithm simply adds the results of three weak PRNGS: LCG, xorshift and MWC.

def uint32(i):
    return i & 0xFFFFFFFF

def kiss():
    # LCG:
    x = uint32( 69069 * x + 12345 )

    # Xorshift
    y ^= uint32(y << 13)
    y ^= uint32(y >> 17)
    y ^= uint32(y << 5)

    # Multiply-with-carry
    t = 698769069 * z + c;
    c = uint32(t >> 32)
    z = uint32(t)

    # Combining all 3
    return uint32(x + y + z)

If the 4 state variables (x, y,z and c) are seeded with non zero values, the number stream generated with kiss passes all tests of the TestU01 suite and has a period of approximately 2**1197. The implementation is easy and it executes quickly.

Good Algorithms Using Cryptographic Primitives

Cryptographic primitives like hash functions or ciphers can be used as PRNGs. They are most likely not worth the effort on an embedded system, if they are not used in a cryptographic context, but they are extremely well analyzed, and produce excellent random number sequences.

Cryptographic hash8 functions have the property, that each output bit depends on every input bit and that the input can not be reconstructed from the output (because the output seems random). They thus are an obvious choice for PRNGs. Using the 256 bit version of SHA-2 gives:

def hash_feedback():
    state = sha256(state)
    return state

The size of state is in this case 256 bit. The length of the period is well below 2**256 but still extremely large.

In a different mode of operation, the hash function is used on a counter:

def hash_counter():
    state = state + 1
    return sha256(state)

In this case, the seed is the initial value of the counter. The size of state can be chosen arbitrarily large. The length of the period in this variant is exactly the number of possible values of state.

Cryptographic ciphers turn a value into another one that looks random. They can therefore also be used to produce random numbers:

def cipher_feedback():
    state = aes256(key, state)
    return state

The state can be chosen arbitrarily long (in multiples of the ciphers block size). The initial values of state and key both make up the seed.

Ciphers can also be used in counter mode:

def cipher_counter():
    state = state + 1;
    return aes256(key, state)

The size of the state can then only be as large as one block.

Whether ciphers or hashes are used, and in which mode has little impact on the statistical quality of the random number sequence, but has a huge impact on the predictability. This will be covered in the next article.

How PRNGs are Initialized

To initialize PRNGs, we still need a seed value that is "random". But only a few bytes.

If your device has some kind of non-volatile memory, the current state of the PRNG should be stored there. In the device's life cycle there will only be a need for one seed value which can be generated externally. Without such storage, the seed has to be generated after every reset.

For simple applications, the current time is a good enough seed, but if the embedded system does not have a real time clock, the only available time is that since reset. The seed is usually chosen during application startup so the time will fall into a small range and all devices will a generate one of a few different sequences of random numbers after every rest.

Other sources of seed material can be:

  • Communication Protocols (Ethernet, Serial, I²C, ...)
  • ADC Jitter
  • Microphones
  • Camera
  • Remainder of Timers
  • Timings of Interrupts
  • HWRNG
  • Data unique to each CPU like an ID or calibration data for ADC and Clock

To collect random data from several sources, the "randomness" should be spread across all bits of the seed by passing it through a hash function.

Using PRNGs

Few applications need exactly the uniformly distributed numbers that a PRNG produces. Common operations include limiting them in a certain range, generating floats or generating numbers with a certain distribution9.

Generating Numbers in a Range

When scaling the value into the desired range, it is easy to produce biased results. The simplest way is using modulo to make them fit in the interval [0..range-1]:

def rand_range_naive(range):
    x = prng()
    return x % range

But this will return small numbers more often than larger ones. If the PRNG returns 1 byte numbers (0..255) and range is 100, the numbers 0..55 will each have a 50% higher chance than the numbers 56..99. 0..99 are not modified, 100..199 are mapped onto 0..99 but 200..255 are mapped onto 0..55.

Assuming that the PRNG's result spaces has some power of two as size, the above function produces fair results for ranges that are also powers of two. With a range of 128, the values 0..127 would be valid and the values 128..255 would be mapped onto 0..127 so that each value from zero to 127 has the same chance.

An algorithm that returns each number with the same probability for arbitrary ranges scales the random number to the next higher power of two range and then discards invalid numbers:

def rand_range(range):
    i = 1
    while i < range:
        i = i * 2
    while True:
        x = prng()
        x = x % i
        if x < range:
            return x

For the worst possible range (one more than a power of 2), the algorithm's average runtime is no more than rand_range_naive, but has no upper bound. On real time applications zou should use ranges that are a power of two or live with slightly biased results.

Generating random Floats

Floating point values between zero and one are simply generated with:

def rand_float():
    return prng() / MAX_PRNG_VALUE

Conclusion

For simple applications, the LCG, seeded with the current time, is probably good enough. For more advanced applications like generating unique identifiers or simulations, the KISS algorithm can be used but must be carefully seeded. Mistakes at using the random values can easily be avoided by looking at the way, other libraries do it.

What entropy sources does your Embedded System have? What do you need random numbers for?


  1. On collision, wait a random time, then try again: https://en.wikipedia.org/wiki/Exponential_backoff ↩︎

  2. http://www.iro.umontreal.ca/~simardr/testu01/tu01.html ↩︎

  3. This topic is covered thoroughly in The Art of Computer Programming, Volume 2, by Donald Knuth ↩︎

  4. https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/random_r.c ↩︎

  5. https://en.wikipedia.org/wiki/Mersenne_Twister#Python_implementation ↩︎

  6. http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html ↩︎

  7. https://eprint.iacr.org/2011/007.pdf ↩︎

  8. More about hash functions in a future article on this blog ↩︎

  9. For a good read about distributions, I recommend the lower half of Python's random module: https://hg.python.org/cpython/file/3.5/Lib/random.py ↩︎

Go back

Add a comment

Update Notification

For an automatic notification on new blog articles, just register your EMail address.

Subscription

We are the Blogger:

Andrea Dorn

After my study of industrial engineering I worked at an engineering service provider. As team leader and sales representative, I was responsible for customers from aviation and mechanical engineering. I am part of the Embedded Office team since 2010. Here I am responsible for the Sales and Marketing activities. I love being outside for hiking, riding or walking no matter the weather.

Fridolin Kolb

I have more than 20 years experience in developing safety critical software as developer and project manager in medical, aerospace and automotive industries. I am always keen on finding a solution for any problem. The statement “This won’t never work”, will never work for me. In my spare time You can find me playing the traverse flute in our local music association, spending time with my family, or in a session as member of our local council and member of the local church council. So obviously I am lacking the ability to say “No” to any challenge ;-).

Michael Hillmann

I have been working for 20 years in safety critical software development. Discussing and solving challenges with customers and colleagues excites me again and again. In my spare time I can be found while hiking, spending time with my family, having a sauna with friends - or simply reading a good book.

Wolfgang Engelhard

I’m a functional safety engineer with over 10 years of experience in software development. I’m most concerned with creating accurate documentation for safety critical software, but lately found joy in destruction of software (meaning I’m testing). Spare time activities range from biking to mentoring a local robotics group of young kids.

Matthias Riegel

Since finishing my master in computer science (focus on Embedded Systems and IT-Security), I’ve been working at Embedded Office. Before that, I worked with databases, and learned many unusual languages (like lisp, clojure, smalltalk, io, prolog, …). In my spare time I’m often on my bike, at the lathe or watching my bees.