Cryptography is the classic example of an application requiring a good source of really random numbers. These numbers can be used as session keys, initialization vectors, seeds for RSA prime number generation, and so on.
The security of a cryptographic algorithm usually depends on the futility of guessing the random numbers chosen by the computer. The key concept is entropy, a measure of the uncertainty contained in a set of values. For example, a user asked to type some random characters is much more likely to type asdf than 9m]g; the entropy is thus not as high as it would be if all strings were equally likely. Thus, even though we type in seven-bit ASCII, we can't generate 35 random bits from just five keystrokes; a common rule of thumb is that the entropy per keystroke is in the range of 1.0 to 1.5 bits [Schneier, p. 190]. So to generate 35 truly random bits we need to use at least 35 keystrokes.
We can use this rule to build a random number generator that combines the entropy inherent in various computer components. They'll be deterministic, but unpredictable from the outside - people who don't have physical access to the computer haven't a prayer of guessing the seed.
We have to be very careful to acquire enough entropy per bit. Netscape learned this the hard way when Ian Goldberg and David Wagner broke the SSL (Secure Socket Layer) implementation in Netscape Navigator 1.2 because it only used the PID (process ID), PPID (parent process ID), and the current time as the seed for the random number generator [Netscape].
Collecting entropy is a task best left to the operating system kernel, since that's where all the entropy-producing events occur. Theodore Ts'o implemented such a generator for the Linux kernel (see the sidebar on the next page). Here's how the /dev/random device can be used to seed Perl's pseudorandom number generator:
#!/usr/bin/perl open(RAND, "/dev/random") or die "No /dev/random?\n"; read(RAND,$seed,4); close(RAND); print "Seed = ", unpack("H*", $seed) , "\n"; srand(unpack("L", $seed)); foreach (1..10) { print rand(), "\n"; }
If your operating system doesn't support such a convenient device, you can use the Math::TrulyRandom module (C code by Matt Blaze and Don Mitchell, Perl wrapper by Systemics Ltd., available from the CPAN) which uses interrupt timing discrepancies as its source of entropy.
Many applications need random numbers, but not necessarily numbers unpredictable from the "inside." Quite the contrary, they often require a source of random numbers that generates the same sequence whenever you run the program. This repeatability is essential for debugging some programs - if your numbers aren't predictable, you can't replicate bugs.
Pseudorandom number generators (PRNGs) are deterministic algorithms that produce such streams of repeatable (and thus predictable) numbers. A good PRNG a) behaves in all important respects like a random variable, and b) is fast. What a) means and how one can test for it is the subject of endless scientific discussions.
Jon wrote about linear congruential generator (LCGs), one of which was proposed by the ANSI C committee. LCGs are defined with a simple recursion formula that calculates the new random number from the last one:
nj+1 = anj + b (mod c)
It's not that bad. The ANSI C generator uses parameters designed for a simple and fast implementation on 32-bit computers (a = 1103515245, b = 12345, c = 231). With such a c the modulus operation translates to just ignoring the overflow. As usual, an advantage is balanced by a disadvantage in another. In this case our price is an unfortunate regularity in the lower bits of the numbers. Returning these numbers is dangerous, since the user might just extract bit 0 (calculating, say, nj mod 2) to arrive at a sequence of bits that aren't random at all. With this generator, they'd be 01010101010101.
To avoid such pitfalls, Digital Unix shifts nj by 16 bits to the right before returning. The constant RAND_MAX is thus set to 215, and consequently Perl's Config module sets $Config{randbits} to 15. This doesn't imply that the period length of rand() is 215, but only that you shouldn't use more than the first 15 bits from any random number. The period length of this generator is the same as that of the ANSI C generator, namely 231.
As LCGs go, the ANSI C generator is...okay. Consecutive numbers are largely uncorrelated, as shown below.
For this plot I used the first 5000 pairs of the form (nj, nj+1), scaled to [0,1].
It's not good, either. This looks fine and dandy; no patterns are visible. But what happened to Jon's image? In his article he wrote about strange things happening when he used the ANSI C generator to add white noise to a 512x512 image. That calls for 262144 random numbers. The first and second random numbers affect pixels next to one another, of course, but there's a twist: the first and 513rd numbers also affect adjacent pixels since the 513rd pixel is directly below the first. We can again use a plot of pairs to visualize correlations, but this time we use (n16384 j, n16384(j+1)) to get the figure below.
Oops. No wonder patterns appeared in the image. We stumbled on an intrinsic feature of every LCG: they exhibit long-range correlations of the worst kind. If you take subsequences from an LCG or otherwise combine numbers from different parts of the period, expect nasty surprises. Images are especially prone to these dangers because of their two-dimensional nature.
A large set of alternatives to the LCG has been published; for a good survey on the current menagerie of PRNG see [Nieder] and [L'Ecuyer]. One can prove that certain generators, such as the EICG (Explicit Inversive Congruential Generator) are not prone to such deficiencies.
A close relative to the Linear Feedback Shift Register generators (see Jon's article) is the Twisted Generalized Feedback Shift Register (tGFSR). They are extremely fast and can achieve high period lengths.
On the TPJ Web site, you'll find a Perl implementation of the TT800 generator proposed by [Matsumoto]. With a period length of 2800, it has a long enough cycle for even the most demanding applications, and it has performed very well in all empirical tests done by the pLab group at Salzburg University.
The Perl implementation is almost a 1:1 translation of the original C code. I only had to work around the sign-preserving nature of the right shift operator (>>) in Perl. Furthermore, I added support for multiple independent streams of pseudorandom numbers.
Although the Perl implementation of TT800 is pretty fast, the speed freaks among us might want to make it even faster by implementing it as an XS module.
XS modules are Perl extensions written in C that can be loaded dynamically at runtime. They can be used to make low level system features available to Perl (as seen in the POSIX and Fcntl modules), or to link in external C libraries such as Tk, GDBM, and NDBM. Like PDL, described in TPJ #5, we use XS primarily for speed.
I will give here a short report on what I had to do for this simple case; full documentation on this topic can be found in the perlxstut and perlxs online documentation.
XS Overview. The glue between Perl and C is provided by the XS language. XS lets you create .xs files containing XSUBs, stub subroutines that you use to glue together a function between Perl and C. You can embed C code directly in the .xs file, but typically the XSUB calls functions declared in previously existing .c files.
Using the interface definitions and code fragments in the .xs file, the xsubpp program bundled with Perl generates C code that does all the bridging for you. The whole process of writing an extension module is closely guided by various programs in your Perl distribution.
One of these programs is h2xs, which helps generate a framework for the XS module. In our case, invoking h2xs -A -n Math::TT800 creates six files: TT800.pm, TT800.xs, Changes, MANIFEST, Makefile.PL, and test.pl.
Types and the typemap. The next step is deciding how to map the data types between Perl and C. In this case we can equate the C struct tt800_state with the Math::TT800 object in Perl. On the C side we create a tt800.h include file containing the type declarations.
Next we create the typemap file containing the mapping instructions. We could map TT800 to the predefined Perl type T_PTROBJ; that would automatically bless all C TT800 structs to a Perl object of the same name. But since we want the Perl object to be named Math::TT800 we simply adapt the definition of T_PTROBJ and create the typemap file shown at right.
(On my system, the predefined type mappings are in /usr/lib/perl5/ExtUtils/typemap.) We place the C implementation of the main generation function of TT800 in the file tt800.c.
This completes the C part of the extension module. We now write the XS functions implementing the Perl module by augmenting the skeleton TT800.xs generated by h2xs. The result is the XS file.
The new() is of particular interest. First it allocates memory for the C structure tt800_state (and therefore for the the Math::TT800 object as well) in RETVAL, which will eventually be returned by the XSUB. Then we initialize the generator by copying the default state with memcpy(). If there were additional parameters (indicated by the items variable predefined by XS), we extract the integer value (SvIV) from the ith parameter (ST(i)) and store it in the state vector RETVAL. We don't need to manually bless the return value here; that's taken care of by the typemap file.
In order to compile the extension module we need to adjust Makefile.PL.
We're now pretty much done. The only thing left to do is to polish TT800.pm (available on the TPJ web site) by modifying the @EXPORT array, adding documentation, updating the list of files in MANIFEST, and including some testing code in test.pl. Then we can build the new extension module as follows:
% perl Makefile.PL % make
and test it with make test.
The Makefile supports two other interesting targets: make dist generates an archive of this extension suitable for uploading to CPAN, and make install adds the module to your computer's Perl library.
From the user's point of view, there is no difference between a module written in pure Perl and an XS module. Using the newly installed Math::TT800 pseudorandom number generator is thus simple:
#!/usr/bin/perl use Math::TT800; $tt = new Math::TT800; for($i=0; $i<100; $i++) { printf "%.8f\n", $tt-<next(); }
Writing such Perl extensions may look like black magic at first, but once you begin to understand the basic principles, the implementation is quite simple.
Peter Hellekalek leads the pLab group at Salzburg University, and supervised the thesis on which this article is based. The research was supported by the Austrian Science Foundation (FWF), project P11143-MAT.
L'Ecuyer, P. Random Number Generation. In Handbook on Simulation, Jerry Banks, ed. Wiley, New York, 1997.
Matsumoto, M. and Y. Kurita. Twisted GFSR Generators II. ACM Trans. Model. Comput. Simul., 4:254-266, 1994.
Netscape Communications Corporation. Potential Vulnerability in Netscape Products. http://www.netscape.com/newsref/std/random_seed_security.html.
Niederreiter, H. New developments in uniform pseudorandom number and vector generation. In H. Niederreiter and P.J.-S. Shiue, eds., Monte Carlo and Quasi - Monte Carlo Methods in Scientific Computing, vol. 106 of Lecture Notes in Statistics. Springer-Verlag, New York, 1995. Schneier, B. Applied Cryptography. John Wiley & Sons, Inc, first edition, 1993.
_ _END_ _