How DNA can be used to generate random numbers

Like Comment
Read the paper

One of the simplest methods for generating random numbers is by rolling a dice. However, what if we need much larger volumes of random numbers, at much higher speeds, say, for example for the secure transmission of information? In that case, a dice would most likely be a very inefficient option for generating random numbers. Luckily, through time, new methods for faster and more efficient random number generation were developed relying mainly on physical processes or software algorithms.

Looking at nature, one of the most abundant statistical processes following a certain probability distribution, is chemistry. So, why not use chemistry and convert the intrinsic randomness of chemical reactions into random numbers? The idea seems simple, however, it is only possible to extract randomness from chemistry, if we can identify the randomness within the chemical process. This would mean that we would have to look at single molecules, which is rarely possible. There is, however, a field of chemistry, in which reading out each individual molecule is fairly cheap and easily possible: the chemistry of DNA. So, if we synthetically write (synthesize) DNA by randomly adding a mixture of all four nucleotides (A, C, T and G), we can subsequently read out (sequence) the DNA using next-generation sequencing technologies, and identify the individual nucleotides in the strand.

During this project, in which we closely collaborated with Reinhard Heckel from the Technical University of Munich, we encountered many fascinating results. We had, for example, found that the data obtained from DNA synthesis contained a specific bias (e.g. some DNA nucleotides appeared more often in the DNA sequences than others). This bias, however, could easily be removed using a very simple data processing algorithm called Von Neumann algorithm, which discards input of two subsequent bits if they are equivalent (input “00” = no output, input “11” = no output) and converts the first of two subsequent input bits to output, if the two bits are not the same (input “01” = output “0”, input “10” = output “1”). This processing step removed any human or technological bias from our synthesized data, resulting in our output to be random (as calculated using the publicly available statistical test suite by the National Institute for Standards and Technology (NIST)).

It is not certain whether or not DNA synthesis could ever become a state of the art random number generator, as the procedure is still quite laborious and costly. However, as the processes of DNA reading and writing become cheaper and faster, maybe there is a chance for this random number generation technology to become commercially competitive one day. Overall, it was very exciting to show that with a few very powerful tools and interdisciplinary work, DNA has an application outreach far beyond being the genetic information in our bodies.

Linda Meiser

PhD student, ETH Zurich