Data corruption in storage media is ubiquitous in today’s digital world, from failing hard drives in data centres to data loss in personal computers. While the initial reaction to a corrupted hard drive usually involves panic and doubts about one’s backup strategy, repair and subsequent data recovery is sometimes possible for hard drives with physical or logical damage. In the last few years, interest in storing digital data in DNA has steadily increased, with a wide range of parties – from academic research groups to manufacturers of hard drives – now envisioning a future in which of our archival data is stored in DNA. Of course, data stored in DNA also experiences data corruption, so we set out to develop an enzymatic repair service for DNA hard drives based on nature’s mechanisms for genome repair.
While DNA’s high information density on the Exabyte-per-gram scale and ubiquity as nature’s storage medium of choice work towards the vision of DNA-based archival storage, its stability is not infinite. Similar to hard drives, time, temperature, and humidity are the critical parameters affecting durability of DNA data storage media. DNA’s major failure mode is the loss of sequences caused by hydrolysis, its main decay mechanism. The resulting breakage of the phosphate backbone of a given sequence, referred to as nicking, prevents amplification by polymerase chain reaction, and thus renders the data saved in this sequence unreadable.
Our goal was to present a simple method to rescue the data from a file stored in DNA, essentially the DNA-based equivalent of a hard drive recovery service. The result: a mix of three enzymes – borrowed from nature’s repair mechanisms for genomic DNA – capable of reversing the hydrolytic damage and restoring the unreadable sequences. To validate the approach, we performed experiments using heavily decayed DNA – aged at 30°C for more than a month – representing the worst-case scenario. Sequencing showed that the repair enzymes successfully reversed some DNA decay by increasing the amount of amplifiable, full-length DNA and enabling error-free data recovery.
Our “enzymatic repair service” has two implications for DNA data storage: it extends the storage horizon for DNA data storage applications, and it further reduces the minimum number of sequence copies required for durable storage. Considering the timescales for archival storage in DNA, our repair process would allow data recovery from archival media left in storage for multiple hundred years longer than they were intended to. This is hard to imagine with today’s hard drives, but there is still a wide gap between today’s research on DNA data storage and today’s scale of conventional storage systems. Nonetheless, enzymatic repair for DNA data storage brings us one step closer to realizing the potential for DNA-based archival storage.