RSA, a short recap
In a public key scheme, and for the sake of simplicity, assume a public scheme based on encryption-decryption (as opposed to e.g. DSA, the Digital Signature Algorithm, where the digital signature generated by the secret key is verified to satisfy a mathematic equation using the corresponding public key), you have two mathematical functions, called keys, the secret key S and the public key P, and one is the inverse of the other, e.g. on a message m, P(S(m)) = S(P(m)) = m. RSA is the only widely used public key encryption scheme, constructed as follows: Generate two primes p and q of a fixed bit length, say 512 bits using a sufficiently random input. Denote the product by n, also known as the modulus. Now chose an appropriate so-called public exponent e. It must be prime to p - 1 and q - 1 (and there is no point in choosing it larger than φ(n) := (p - 1)(q - 1), as always, mφ(n) = 1 mod n - see below). There are computation advantages in choosing e small, such as 3, 17 or 65537 = 216 + 1, also known as the 4th Fermat prime. By running an extended version of Euclid's algorithm, which normal outputs the smallest common divisor of two numbers a and b, you can relatively easily find a number d such that for some number a, e∙d -a∙φ(n) = 1 modn. In the following, we just write ab rather than a∙b.
Due to the so called Fermat's Little Theorem - proved a lovely August morning around 1640 by the famous and legendary French judge and amateur mathematician Pierre Fermat - if you take any number m less than n, you always have that mφ(n) = 1 modn. Hence maφ(n) = 1 modn as well obviously, for any a, so maφ(n)+1 = m = med modn. Now define S as S(m) = md mod n and P(m) = me mod n, and it follows that P(S(m)) = S(P(m)) = m.
The amazing thing is that even though P is the inverse function of S, it is practically infeasible to construct P from S without knowledge of the corresponding prime divisors of the corresponding modulus in spite of all the mathematics that has been developed during the last 3000 years and simultaneously deploying the massive computing strength available these days. That is, provided the primes p and q comprising the modulus have been generated in a sufficiently random fashion!
When I have been lecturing on public key cryptography over the years, I have often been asked if there are enough primes for public key cryptography. The answer is that indeed there is. This is a mathematical fact, known as the Prime Number Theorem, which states that given a number n, there are approximately n/ln(n) primes less than n. Here ln(n) is the natural logarithm of n, e.g. ln(n) is defined by eln(n) = n, where e is approximately 2.71828. We will spare you for the definition of e, though! So as ln(n) is very small compared to n, there are plenty of large primes, and the message for this article is that there are so many that if you use a new truly random input of say 128 bits for each RSA prime generation using a sufficiently strong algorithm, you will never in the future history of mankind see any prime being generated twice.
As a warm up to the next section, what would happen if Alice and Bob accidently - or on purpose perhaps - used a common prime r in their RSA keys? So to find out, assume Alice use the modulus a = pr, and B the modulus b = qr, where p, q and r are all primes. Well, you quickly find out by running the Greatest Common Divisor (GCD) algorithm based on Euclid's algorithm on a and b. Indeed, this outputs a = b if p = q, otherwise the common prime r. You may then divide r into a and b respectively and whence recover p and q, respectively. From this, using the public exponent of Alice you can find her secret exponent and similar with Bob. So this is truly and utterly disastrous.
Greatest Common Divisor algorithm on two numbers a and b: Take the largest, say a. Now divide this by b to get the remainder c using Euclid's algorithm. Thus a = xb + c, where c < b. Now do the same with the pair b and c, and so on. Eventually, you end up with the greatest common divisor of a and b.
"Ron was wrong, Whit is right"
This is the intriguing title of a very interesting article [Letal] that was circulated in the spring of 2012 and appears in the Proceedings of Crypto'12 (Springer Lecture Notes Series in Computer Science) under the more anonymous title "Public Keys". Ron is Ron Rivest and Whit is Whit Diffie, two of the true pioneers of public key cryptography. Whit is the co-inventor of public key cryptography with Martin Hellman, and Ron is the R in RSA (together with Adi Shamir and Leonard Adleman). What the title is referring to is that a lot of RSA keys out there are NOT generated in a sufficiently random fashion, on the contrary. The title presumably reflects hat this comes as somewhat of a surprise to Ron Rivest, but not to Whit Diffie.
What the authors of [Letal] did was that they initially in 2009 or before collected about 4.7 million distinct 1024-bit RSA moduli which were openly available on the web, and ran GCD on all of these pair wise. It turned out that 12720 had a common prime divisor!!
Even worse, according to [Letal], "of 6.6 million distinct X.509 certificates and PGP keys [.] containing RSA moduli, 0.27 million (4%) share their RSA modulus, often involving unrelated parties. Of 6.4 million distinct RSA moduli, 71052 (1.1%) occur more than once, some of them thousands of times."
The authors conclude that this can likely be attributed to the fact that a lot of solutions are more or less copied and used several if not many times over.
But one thing is to use a weak seed for key generation; another is to actually get the math behind RSA wrong. The authors of [Letal] report on eight instances where the public exponent turns out to be ....1 - in which case of course the corresponding secret exponent d is 1 as well, and in two instants the public exponent chosen is actually even, in which case P(S(m)) = m will only hold with probability less than 25%, so obviously, the solution has not even been remotely tested.
An interesting and intriguing question is what the status is for all the RSA keys which cannot easily be collected from the web?
So what is a good strategy then?
First of all, you obviously need someone with a minimum of mathematical competence to build and test, or you might as well forget about using public key cryptography. (I often receive e-mails with a content that is not particularly interesting or important, which has been PGP signed. I never bother to verify the signature and call this security pollution. If this is all you use public keys for, don't).
Secondly, you must ensure yourself that all the algorithms in the crypto library perform correctly, preferably according to some appropriate and official standard.
Thirdly, you must ensure that this standard also addresses the issue of the key generation appropriately and properly.
Next, and this is absolutely crucial in order to prevent the disasters revealed in [Letal] and discussed above, you must ensure that the random seed used as an input for the key generation algorithm is chosen using a random function with an entropy of at least 128 bits AND is just not lifted from a previous solution, e.g. by an iteration as described above which generates new pseudorandom bits from a seed. In other words the random seed must be truly random and unique.
Last, but not least, you need to ensure that the package you are about to use has been properly tested for its functionality.
It is possible to generate very strong and practically unbreakable cryptographic keys, but it requires a thorough understanding of the many requirements and pitfalls, and one of the major messages of [L] is that this kind of skill is sadly missing far too often.
Any key generation needs a sufficiently random input, preferably from a random function with an entropy of preferably at least 128 bits, and it is just as important that this random input has not just been lifted from a previous solution.
However, once you have one such seed which is truly random and unique, it is easy using standard algorithms to generate other keys for use in the same package.
[Letal] Lenstra, Arjen K., James P. Hughes, James P., Augier, Maxime, Bos, Joppe W., Kleinjung, Thorsten, and Wachter, Christophe. "Ron was wrong, Whit is right", to appear in Proc. of Crypto'12, Springer Lecture Note Series in Computer Science.