## Sunday, May 1, 2016

### The birth of statistics and its first application – in cryptanalysis

The field of statistics is often said to have been born in 1662 out of the collaboration between John Graunt and William Petty to develop human statistical and census methods. However, the earliest writing on statistics is actually found in a 9th-century book called "Manuscript on Deciphering Cryptographic Messages," written by Al-Kindi.
Al-Kindi (801–873 AD) was, among other things, an Arab mathematician during the Abbasid Caliphate. One of his greatest works in the field of statistics was the world’s first description and use of statistics – applied to cryptanalysis – which he discussed in the aforementioned book. In his discussion of the quantitative techniques in cryptanalysis, Al-Kindi explained how to use letter frequency statistics to solve a given cryptogram:
“One way to solve an encrypted message, if we know its [original] language, is to ﬁnd a [different clear] text of the same language long enough to ﬁll one sheet or so and then we count [the occurrences of] each letter of it. We call the most frequently occurring letter the “ﬁrst”, the next most occurring the “second”, the following most occurring the “third” and so on, until we ﬁnish all different letters in the clear text [sample]. Then we look at the cryptogram we want to solve and we also classify its symbols. We ﬁnd the most occurring symbol and change it to the form of the “ﬁrst” letter [of the clear text sample], the next most common symbol is changed to the form of the “second” letter, and the following most common symbol is changed to the form of the “third” letter and so on, until we account for all symbols of the cryptogram we want to solve… But if the cryptogram is too short, equivalence does not apply, letter ranks are not correct and [consequently] a second trick should be used to recover letters…”
There are many other contributions Al-Kindi has made to statistics. He discusses the principles of cryptanalysis, where he explains four cryptanalysis methods for normal texts and an additional method for decrypting poetry. He also discusses cipher types, Arabic phonetics, Arabic syntax, and he conducted a study on Arabic letter frequency, the first such study ever done. But rather than expand on those, I would like to present an astonishing example of frequency analysis. I found this example on the website of the Department of Mathematics at Aarhus University in Aarhus, Denmark:
PIRIJSGYTDQRKVIPITKGTHMJUQRPDLSQQCKWSCMSSCIKPDBDJKNRPYLSKGKBYMJT SCIMPMARKJSPDAUSDKJQSKDSMPIKGTIPMJTHKPIIXSIJQDVISCMJLPIVDKUQGYSC KUBCSSCIWKPTRDLCIPDJIUPKLIMJGMJBUMBIQRKHIQNPKHSCIMPMADRWKPTQDNPS CIJDJSCRIJSUPYMPMAQRDIJSDQSMGFDJTDDQSCIMUSCKPKNSCIKGTIQSFJKWJAKK FKJRPYLSKGKBYMJSITMSDJBMJYKSCIPAYHKPISCMJSCPIICUJTPITYIMPQSCDQLM LIPCDBCGDBCSQSCIQLIRDNDRRKJSPDAUSDKJQKNQKHIMPMARPYLSKGKBDQSQAMQI TKJSCIJIWGYTDQRKVIPITTKRUHIJSQSCMSDJRGUTIAKKFQKNMGFDJTDDAJMTGMJM JTDAJMTTUPMDCDHNMRSKPQAIBDJTSCIIHIPBIJRIMJTMTVMJRIHIJSKNMPMARPYL SKGKBYMPITDQRUQQITSCITDQRKVIPDIQPILKPSITDJSCDQLMLIPLUQCSCINPKJSD IPQKNSCICDQSKPYKNRPYLSKGKBYAMRFAYMAKUSNDVICUJTPITYIMPQ
We know that the language of this message is English. First by analyzing the frequency of letters on a page of English text we find that “e” is the most common letter, followed by “t.” In this message above, “I” is the most common letter followed by “S.” So I = e and S = t.
PeReJtGYTDQRKVePeTKGTHMJUQRPDLtQQCKWtCMttCeKPDBDJKNRPYLtKGKBYMJT tCeMPMARKJtPDAUtDKJQtKDtMPeKGTePMJTHKPeeXteJQDVetCMJLPeVDKUQGYtC KUBCttCeWKPTRDLCePDJeUPKLeMJGMJBUMBeQRKHeQNPKHtCeMPMADRWKPTQDNPt CeJDJtCReJtUPYMPMAQRDeJtDQtMGFDJTDDQtCeMUtCKPKNtCeKGTeQtFJKWJAKK FKJRPYLtKGKBYMJteTMtDJBMJYKtCePAYHKPetCMJtCPeeCUJTPeTYeMPQtCDQLM LePCDBCGDBCtQtCeQLeRDNDRRKJtPDAUtDKJQKNQKHeMPMARPYLtKGKBDQtQAMQe TKJtCeJeWGYTDQRKVePeTTKRUHeJtQtCMtDJRGUTeAKKFQKNMGFDJTDDAJMTGMJM JTDAJMTTUPMDCDHNMRtKPQAeBDJTtCeeHePBeJReMJTMTVMJReHeJtKNMPMARPYL tKGKBYMPeTDQRUQQeTtCeTDQRKVePDeQPeLKPteTDJtCDQLMLePLUQCtCeNPKJtD ePQKNtCeCDQtKPYKNRPYLtKGKBYAMRFAYMAKUtNDVeCUJTPeTYeMPQ
Substituting into the message, we find many instances popping up of “tCe.” That looks a lot like “the,” so C = h.
PeReJtGYTDQRKVePeTKGTHMJUQRPDLtQQhKWthMttheKPDBDJKNRPYLtKGKBYMJT theMPMARKJtPDAUtDKJQtKDtMPeKGTePMJTHKPeeXteJQDVethMJLPeVDKUQGYth KUBhttheWKPTRDLhePDJeUPKLeMJGMJBUMBeQRKHeQNPKHtheMPMADRWKPTQDNPt heJDJthReJtUPYMPMAQRDeJtDQtMGFDJTDDQtheMUthKPKNtheKGTeQtFJKWJAKK FKJRPYLtKGKBYMJteTMtDJBMJYKthePAYHKPethMJthPeehUJTPeTYeMPQthDQLM LePhDBhGDBhtQtheQLeRDNDRRKJtPDAUtDKJQKNQKHeMPMARPYLtKGKBDQtQAMQe TKJtheJeWGYTDQRKVePeTTKRUHeJtQthMtDJRGUTeAKKFQKNMGFDJTDDAJMTGMJM JTDAJMTTUPMDhDHNMRtKPQAeBDJTtheeHePBeJReMJTMTVMJReHeJtKNMPMARPYL tKGKBYMPeTDQRUQQeTtheTDQRKVePDeQPeLKPteTDJthDQLMLePLUQhtheNPKJtD ePQKNthehDQtKPYKNRPYLtKGKBYAMRFAYMAKUtNDVehUJTPeTYeMPQ
Now, “H” appears only once in the entire message here. We find that on the page of English text, c, u, m, w, f, g, y, p, b also only appear once. So H may be any one of those letters. To find out which one it is, we isolate the group of letters that surround the only “H” in the message: theeHe. We narrow the possibilities for what “H” can be down to m, w, g, y based on the fact that they are the only letters that could possibly make a meaningful word in that position (emerge, ewe, egest, eye). We think “emerge” is the most likely word, so we go with H = m. We continue doing this for each letter until we decrypt the message:

References:
Kryptologi De arabiske bidrag, af Marc Skov Madsen, Department of Mathematics, Aarhus University, Aarhus, Denmark