Fun with prime numbers
April 23, 2010
While messing around in Haskell, I came across a rather delightful little nugget of code.
It started out with helping a friend debug code for a primality testing routine. I then started to wonder how I would define such a function in Haskell. It’s a reasonably simple exercise in thinking pure thoughts in Haskell, and if you want to give it a try I suggest trying now, before reading on.
Still here? So, we want to define a function isPrime, having a type of Integer -> Bool. We’ll use the straightforward primality testing algorithm for now: a number N is prime iff no prime number up to sqrt(N) divides it.
So, how would we implement this in Haskell? First, let’s make an assumption. Let’s assume that we have at our disposal an ordered list of prime numbers, from 2 up to infinity and beyond.
primes :: [Integer]
As an aside for my readers who are not familiar with Haskell, I should point out that Haskell is a so-called lazy language. This means, among other things, that you can define infinite lists without having your program hang for ever. While the list is technically infinite, the lazy nature of the language means that its elements will not be computed until they are actually needed.
Now, you might call me out here for begging the question: if I already have a list of all the primes, then I have no need to test numbers using the algorithm above. However, let’s disregard that for a second and see where it takes us.
Given the list of all primes, we need only the primes from 2 to sqrt(N):
candidateFactors :: [Integer]
candidateFactors n = takeWhile (\x -> x*x <= n) primes
This reads reasonably well as English: take elements from the primes list, until an element causes the lambda function to fail. Next, we need to check whether any of these cleanly divides N. Fortunately, the Haskell Prelude has just what we need, and we can without further ado define isPrime:
isPrime :: Integer -> Bool
isPrime n = all (\x -> n `rem` x /= 0) (candidateFactors n)
Again, this should be fairly straightforward to read: N is prime iff, for all candidate factors X, the integer division of N by X is not clean (that is, X is not a factor of N).
This definition of primality is very clean and concise, and not very far removed from the mathematical definition. This is fairly common in Haskell.
However, we must now go back and figure out what to do with our initial bluff: this nice primality predicate assumes the existence of a list of prime numbers from which to draw candidate factors. But this list doesn’t exist, and unless we’re prepared to enter all primes from 2 to infinity by hand, we need a way of programmatically defining that list.
Let’s now reverse our assumption. Assume that we have an isPrime function, with an implementation based on magic, which can tell us whether or not a number is prime without using the primes list. Given isPrime, it is of course trivial to define primes:
primes :: [Integer]
primes = 2 : filter isPrime [3,5..]
This should be read in two parts, centered around the colon: first, on the right, we take the infinite list of all positive odd integers, and filter it using isPrime to retain only the prime numbers. Then, we tack 2, the only even prime, onto the front of that list, and voilà! One infinite list of all the prime numbers.
We are currently high in the spheres of academic code: nice, but quite useless, because we assume too big a part of the solution. This is where it gets fun: let’s remove the two assumptions, smush the two code snippets together, and see what remains:
primes :: [Integer]
primes = 2 : filter isPrime [3,5..]
isPrime :: Integer -> Bool
isPrime n = all (\x -> n `rem` x /= 0) candidates
where candidates = takeWhile (\x -> x*x <= n) primes
Here’s the kicker: the above code compiles, and correctly produces an infinite list of all the prime numbers.
Why? isPrime relies only on primes up to sqrt(N) to decide if N is prime. Examine the base case of determining the primality of 3 given a list containing only 2, and you’ll find that it works correctly. After that, the length of the primes list grows faster than the length of the prefix that isPrime needs to give a primality verdict for the following number (compare f(x) = x and f(x) = sqrt(x)).
I find this definition of prime numbers absolutely delightful. It has a wonderful zen-like quality to it, and yet successfully gets away with defining the list of prime numbers in terms of itself. Even better, it turns out that this implementation is quite efficient as well! The Haskell compiler is capable of turning this roundabout definition into a fast prime number calculator.
After posting this to #haskell to share this delightful definition (which, of course, most of them had already seen - the joys of being new to a language!), I was directed to a research paper by Melissa O’Neill, in which she goes over various Sieve of Eratosthenes implementations, and shows how this beautiful definition can be made even more efficient. Another paper on my stack of books, papers, manuals and novels to read…
How easily does your language of choice let you define a reasonably efficient Sieve of Eratosthenes?