Generating better random numbers

Random numbers are used for many things on a computer. The most important of these is generating random numbers for cryptographic algorithms used in programs such as SSH, GPG, SSL, etc... The quality or strength of your encryption depends very strongly on the "randomness" of the numbers provided by your random number generator (RNG).

Linux has a builtin RNG, that is sufficient for many simple applications, but for applications requiring higher security, something more is needed. There are several ways that you can improve upon the numbers generated by default Linux RNG. We will discuss several of these methods here.

Background
Here we will briefly cover some necessary background in cryptography. This is a huge field, and could not possibly be covered in its entirety here. There are some great resources at the bottom of this section if you wish to read more (you should).

Randomness & Entropy
A good place to start discussing random number generators, is to define what it means for a number to be random. This is not as simple as it seems, and "randomness" is something that few people really understand (although many think they do).

The technical definitions of randomness get into some pretty hairy math (much of which I don't understand), and will require quite a bit of study to really understand.

There are a few basic things that most people agree on however:


 * Random numbers are unpredictable -- that is in a series of truly random numbers, there is no way for you to know what the next number will be. There is no discernable pattern.


 * Random numbers are not compressible, i.e. there is no shorter way to describe the number other than the number itself.


 * If you have a set of numbers from which you select numbers at "random", then each number is equally likely to be selected.

Entropy is technically defined as the amount of disorder or uncertainty in a system. Entropy is often mistakenly said to be the amount of "information" present in a message. However, it is actually exactly the opposite! Knowledge (information) is not uncertainty. You have to lose entropy (uncertainty) to gain information (certainty). Grokking this will make understanding the mechanism behind the Linux RNG much easier.

Think of it this way: you basically have a "pool" (a file) of numbers that are derived from all sorts of events that are unpredictable (more accurately -- not too predictable), such as keystroke timings, fluctuations in clock frequencies, static from your sound card, and more. You have no idea what these numbers are -- you are most likely unaware of how fast your hard drive read head is moving, and probably aren't keeping track of the time that lapses between each key you press on the keyboard.

This pool of numbers is an "entropy pool" -- i.e. a pool of information that you are uncertain about. You pull numbers out of there, which depletes the entropy, and converts it into information (because now that you can see the number -- you are no longer uncertain about it). Thus you drain entropy to gain information -- and the information gained is unpredictable until you have extracted it and look at it. That is, the information you have is random.

Further Reading:


 * Information Theory Primer
 * Information Is NOT Uncertainty
 * Entropy & Information Theory

Truly Random vs. Pseudorandom
Computers are very deterministic, meaning that they are very predictable -- given the current state of your computer, it is very easy to determine what its future state will be. This is great most of the time -- since we all like our computers behaving the same each time we use them. But it makes generating "random" numbers very tricky. In order for a number to be random, it can't be predictable -- so how do you get use the predictable state of your computer to create and unpredictable number?

Most of the time, when you are supposedly getting "random" numbers on your computer, you are actually getting pseudorandom numbers instead.

Acceptable Entropy Sources
(list ~ amount of entropy that each can provide, limitations of each)


 * Keystroke timings
 * Caveats about finger placement and tendency for timing to be predictable as result of (show diagram)


 * Mouse Movements
 * mouse-coordinates will give less randomness than you might expect. the problem is that with networked applications like web browsers, the application traffic between the client and server effectively publish the locations and sequence of the client's mouse-events. thus, the only private noise that's left is the position of the cursor _within_ the button that gets pressed. since screen buttons vary in size from a few hundred to a few thousand pixels, you're really getting only 8 to 12 bits of variability per event, instead of the 20 or so that you might expect.


 * Interrupts
 * Disk seek times
 * Radioactive Decay
 * Quantum Noise

Bad Entropy Sources

 * Time (distinguish '"date" time', from variations in clock frequencies)
 * only 31,556,926 seconds per year...VERY small number of possible values for a computer to test


 * Values of keystrokes as opposed to keystroke timings
 * Data generated might contain info about what the user is typing that would be useful to an attacker.
 * English language contains low entropy (~1.3 bits/character)


 * Public sources like random.org, etc
 * This violates the unknowable and unpredictable part of the definition of randomness. If an attacker finds out that you get your numbers from a public source, then your "random" numbers are completely useless.


 * Selections from natural language texts (no project gutenberg....)
 * English language = low entropy....
 * See public sources above


 * Network Devices
 * Can be easily influenced by attacker

Understanding The Linux RNG

 * Analysis Of the Linux RNG
 * Entropy sources used
 * random.c
 * http://www.cs.berkeley.edu/~daw/rnd/linux-rand
 * /dev/random & /dev/urandom
 * Primary / Secondary pools
 * Pool size / entropy_avail

Configuring the Kernel

 * Selecting the right hardware RNG (if applicable)
 * char devices
 * crypto--->via padlock
 * crypto algorithms

Installing & Configuring the necessary software
There are several tools that can be used to gather entropy from hardware devices. Many of these are especially useful if you do not have a hardware RNG (how do I link to HW RNG section below???).

entropy broker
Entropy broker collects entropy data from 1 or more sources. It then mixes all that data into pools after measuring the amount of information in it. Then, clients can retrieve data from it. This is especially usefull in case of virtual machines for which the usual methods don't work. http://www.vanheusden.com/entropybroker/

rng-tools
The rng-tools program is part of the "gkernel" package, found at http://sourceforge.net/projects/gkernel/

Nothing in this package will in itself generate random data, but if you have a hardware RNG (discussed below), there is a program in here for you called rngd which will pull random bits from your the RNG into the /dev/random pool

timer_entropyd

 * No extra hardware required
 * This package uses the drift of your sleep timer to seed the PRNG.

To install:

audio-entropyd

 * Sound card support must be built into kernel or module must be loaded
 * Does not depend on sound being played -- uses static from device

To install the audio entropy daemon:

video-entropyd

 * Needs v4l enabled in kernel
 * Need to have a video device hooked up and pointed at a random source

To install:

(needs expansion)

Clock Randomness Gathering Daemon

 * clrngd Project Homepage (Freshmeat)

Gathers entropy from tiny variations between high-frequency hardware clocks... To install:

HAVEGE

 * HArdware Volatile Entropy Gathering and Expansion

There is currently no ebuild for this.

ekeyd

 * This requires special hardware, but it is one of the cheapest hardware solutions available.

To install:

Hardware RNGs
For anyone who is interested in doing anything serious, a PRNG will not be sufficient. It is way too slow, and the numbers are not of the same quality as a hardware RNG.


 * Hardware Random Number Generators
 * Random Noise Sources

There are a few ways that you might come into possession of such a device.

Built-In To Your Motherboard
The lucky amongst you will have a RNG built into your motherboard:


 * List of such motherboards and documentation for them
 * Intel i8xx variants
 * VIA mobos since ~2003 (get list of common varieties)
 * Document for VIA HW RNG is here (Under CPU): http://linux.via.com.tw/support/downloadFiles.action

Purchasing One
Those of you who are wealthy, can spend hundreds of dollars to buy a commercial hardware RNG (there are hundreds of them out there -- just look on Google).

However, this is not the recommended approach for several reasons:


 * First of all, it's a waste of money. With a bit of effort, you can build one yourself for a fraction of the cost. These things can run into the hundreds of dollars, where you can build your own for less than $50. Plus you get bragging rights -- you get to tell your friends that you have a RNG that uses radioactive decay to generate random numbers. Isn't that more exciting than telling them that you bought it?


 * Mass manufactured means that if there is a flaw in it, then it is likely that the flaw is present in all units. This means that if someone discovers a weakness in it, then you no longer have "random" numbers. You have predictable numbers like everyone else that bought it.


 * Third you are trusting that a person that you don't know has not designed some type of "backdoor" or predictable pattern into the device. Security does not involve trusting people.

Build It Yourself
You can build a hardware RNG for a few bucks from parts bought at an electronics store (or from salvaged parts if you are REALLY clever). You must have a basic knowledge of electronics to do this, but it is nothing you can't learn with a few days worth of reading and a bit of practice handling a soldering iron:

Instructions & Schematics For Homemade RNGs

 * LavaRND
 * Hardware Random Number Generator
 * Hardware Random Bit Generator
 * Smoke Detector RNG
 * Yet Another RNG Schematic

NIST Tests
http://csrc.nist.gov/groups/ST/toolkit/rng/index.html

The National Institute For Standards and Technology (NIST) lists a battery of tests that it feels are the standard for testing randomness.

rngtest
For a simple test, you can use "rngtest", which is part of the rng-tools package. This will run the NIST FIPS-140-2 tests for randomness.

Installig it is as simple as:

This is a very easy-to-use, quick test to ensure that everything is working properly.

Dieharder
This is one of the most comprehensive RNG test suites available. It incorporates all of the NIST tests, all of the diehard tests (a very respected RNG test-suite written by George Marsaglia), and several unique tests written by the author, Robert G. Brown (the "rgb" tests). This tool is a must have.

There is currently no ebuild for this in portage, so you will have to build it from source yourself. Don't worry, it's not that hard... Not true any more :). See Method 3: Ebuild install --Anka

But first, you need the GNU Scientific Library (gsl) installed. This is simple:

Now you need to download the latest version of dieharder from here: http://www.phy.duke.edu/~rgb/General/dieharder.php

You can get it in either .tar.gz or .rpm format. Whatever suits your fancy. We'll cover both methods.

Method 1: .rpm install
To do it this way you need rpm installed. If you don't have it installed already:

Now all you have to do is

Method 3: Ebuild install
There is now an ebuild for installing dieharder. You can download it at (dieharder-2.24.7.ebuild and dieharder-2.24.7-build.patch is currently the latest files).

Please read HOWTO_Installing_3rd_Party_Ebuilds for more details.

Place the ebuild in an overlay. I use app-crypt as category.

Run as root (if dieharder-2.24.7 still is the newest version)

The package is "masked by keyword" and you will also have to calculate the digest. Read HOWTO_Installing_3rd_Party_Ebuilds for instructions.

How to use dieharder
dieharder is very well documented. The man pages and this page will tell you all you need to know to get started.

When interpreting the results, the most important thing to remember is that the "p value" is the percent chance that the numbers tested came from a "perfect" RNG. So a very high (100 being the maximum) p value means that the quality of the random numbers from the RNG you tested is very good. If it tells you p = 0.0543, that means that there is only a 5.43% chance that your numbers are really "random". On the other hand, p = 0.994 means that there is a 99.4% chance that your RNG is producing truly random numbers, which is a bit more promising.

GNU Scientific Libary
http://www.gnu.org/software/gsl/manual/html_node/Random-Number-Generation.html

The GNU Scientific library is required for the dieharder tests mentioned above. But it is also useful, in and of itself. It contains several different, high-quality PRNG functions for you to use.

If you are curious info gsl contains a wealth of information on how to use it.