Cryptographically Secure Random Numbers
by Matthias Riegel (comments: 0)
In the previous article I described basic algorithms and how to use them. In this article, I give an overview on how pseudo random number generators (PRNG) are made cryptographically secure (CSPRNG), using the algorithms from the previous article as example.
As with all cryptographic systems, you should not implement them yourself but rely on well tested functions written by cryptography experts. Using them correctly is difficult enough but becomes easier if you have an idea of the internal workings.
The text below mostly ignores hardware random number generators. If you have one that produces numbers fast enough and you trust that it was implemented without bugs (or back doors) and that it will never break, there is no need for any of the algorithms or entropy gathering described below.
The CS in a CSPRNG
A CSPRNG is a system, built around a PRNG, that is suitable for generating number for cryptographic uses. To make a PRNG cryptographically secure, the following properties must be ensured:
- no detectable patterns in the output (must pass all randomness tests)
- correct initial seeding
- The PRNG's state is not disclosed
- Even if the state is disclosed, it must not be possible to reconstruct previously generated numbers
- If the state is disclosed, the CSPRNG must recover from that state disclosure by "reseeding"
- independent of weaknesses in the underlying PRNG
In the best case, your embedded system is operating inside a safe, not communicating with the outside world and surrounded by people that have the same goal as your system. In one of the worst cases, it is lying on the table of someone highly motivated to change the way it works (think ankle monitors1), or just someone very curious, with a lot of computation power (rented from amazon).
What an attacker is probably able to do with your system:
Ask for the next random number. He could for example by trying to establish a new secure connection (DH-Key Exchange2) or request a new web session id, in both cases your system will generate a random number and send it to the attacker.
Compromise some or all of entropy sources:
- Control or record all the network traffic (esp. WLAN)
- Destroy some components
- Manipulate others
- Measure the operation of the rest Remaining, uncompromised entropy sources might operate quite slowly.
Calculate many many hashes per second. A python script on my desktop computer does around a million
If you use the random number generated by function
hash_feedback, and an attacker
knows that number (for example because you told him in a Diffie Hellman Key Exchange),
he will be able to compute all future random numbers, because he knows the
internal state of the function.
In contrast, functions
cipher_counter do not expose the internal state,
cipher_feedback exposes only a part of it, the
key stays secret.
If the attacker knows the internal state of function
hash_counter (because you did
not seed well), he can not only calculate all future random numbers (there is
no cure for that except proper reseeding), but he can also calculate all
numbers that were generated before the attacker compromised that state.
If the attacker knows the content of both
he can backtrack, as well as with the
cipher_feedback by decrypting
the previous states.
The only function from my previous article, which is immune to this attack is
hash_feedback because hashing is not reversible.
Backtracking is effectively prevented for the
cipher_ functions by renewing
key every couple of invocations:
def cipher_ctr_rekey() state = state + 1; if state & 7 == 0: key = aes256(key, state) state = state + 1; return aes256(key, state)
This way, the old key is lost every 7 invocations and backtracking is not possible beyond this point.
The straight forward way to add entropy (make the internal state more random) to the CSPRNG would be to include gathered random material into the state immediately. When using the timing of interrupts for example:
def on_interrupt(): add_entropy(time()) def add_entropy(x): state = sha256(state + x)
This way, the internal state will be made more random, if the interrupt timing was at least partially random (from the attacker's point of view) but the state does never loose entropy, even if the event is controlled by the attacker.
The point of reseeding however, is to recover from a compromised state. In this
add_entropy function is not optimal. If the event contained
bits of entropy (bits that the attacker can not estimate), the new state has
2**n possible values. By requesting the next random number, the new
internal state can be recovered by trial and error in
If you add
m events with
n bit of entropy each, the state can be recovered in
m * 2**n operations. If, instead, you collect the
m events and use them
all together to reseed, the attacker will have to perform
(If you like examples with numbers: 10 Events with 5 bits of entropy each
can either take 320 (=
10 * (2**5)) operations to recover from or they
can be combined to take an attacker 1 125 899 906 842 624 (=
operations to recover from).
Since in one second, my machine easily does
2**20 operations, and the entire
Bitcoin network does
2**61 (earning less than $20)3 it might be
reasonable to accumulate at least 100 bits of entropy before reseeding.
The problem is, that you don't know how much entropy an event contains. Saying a network packet is worth 20 bit, a mouse interrupt is worth 3bit, etc. may either be pretty close or completely wrong if the attacker provides the network packets and the mouse moves.
The reseeding problem is solved
by the "Fortuna"4 algorithm. It distributes gathered entropy evenly among
several buckets and uses these buckets to reseed a generator which is similar
cipher_ctr_key (and safe against backtracking and state disclosure).
The trick is, that the second bucket is used only half as
often as the first, the third only half as often as the second and so on.
While many reseeds are below the attackers limits, eventually one bucket will
contain enough entropy to overwhelm the attacker's
computation power and you start to produce secure random numbers again. There
are algorithms that use your laboriously gathered entropy more efficiently, but
Fortuna just made a lot of sense to me when I read the paper so I'm describing
Cryptographically secure, even with a weak PRNG
If the used PRNG has weaknesses, secure random numbers can still be generated if
enough entropy is gathered for each number. The Linux
/dev/random device has an
internal counter of estimated random bytes in the pool and may block when requesting
the next number until more entropy is gathered. The
dev/urandom PRNG does not block
if the entropy estimator is low and should therefor (stated in the man pages) not be
used for cryptography.
Although not accurate, the approach of estimating entropy makes sense assuming that every cryptographic algorithm will be found to have a weakness sooner or later and there is no need to quickly recover from state disclosure since an attacker that can influence the system enough to compromise the random pool will also able to do much worse to the system.
The opposite (Fortuna relies heavily on the security of a block cipher) also makes sense since you do not need high quality random numbers once the cryptographic functions we use are not safe anymore.
While generating numbers that look kind of random is pretty easy, generating ones that are safe to use in cryptography is more complicated. I described some of the problems, but there are more.
On the matter of the security of cryptographic algorithms that Fortuna relies on versus entropy estimation that the Linux kernel uses I personally trust both. For an embedded system I would use a CSPRNG like Fortuna that does not block when the entropy counter is low, or use a HWRNG to feed the entropy pool.
In which situations do your embedded systems use cryptography, how do you generate keys?
Diffie Hellman is a method to establish a common secret over an insecure channel where both parties choose and submit a random number. ↩︎
The Bitcoin network is a large network of computers("miners") that repeatedly compute
sha256(sha256(block + i))with varying
i. (They do that to earn a reward for finding the right
ifor the block which also confirms transactions that are stored in the block which enables a dezentralized payment system.) Numbers from https://blockchain.info/en/stats (March 2016) ↩︎