From 80 kilobyte floppy disks, developed in 1971, to 1 terabyte portable hard drives, introduced in 2007, data storage has increased exponentially over the past decades. Now, researchers at the University of Washington, in collaboration with Microsoft, have come up with a new technique for storing data that could further increase digital storage capacities (1). Their technique uses DNA, which can store up to 1 exabyte per mm3—about ten million times more data than current methods of storage (2). There findings were presented at the ACM International Conference on Architectural Support for Programming Languages and Operating Systems.
The scientists were able to store and retrieve four image files in their initial tests. Their storage system requires both a DNA synthesizer, which transforms digital data into DNA sequences, and a DNA sequencer, which retrieves and converts the information into a readable form. While digital files are stored using binary code, consisting of zeroes and ones, DNA uses four separate units, or nucleotides – adenine, guanine, cytosine, and thymine (1). The researchers developed an algorithm that converts binary into DNA sequences. Single-stranded DNA molecules can then be synthesized through controlled chemical reactions. However, DNA can only be synthesized in segments of about 200 nucleotides, or about 100 bits of information. Thus in practice, data must be stored as a collection of short DNA strands, each containing a fragment of the total information (2).
Scientists also assign a unique key to each data set, which could be a video, text document, or other digital file. The key corresponds to a specific primer, or short RNA sequence that initiates DNA replication, allowing scientists to retrieve each file separately using the correct primer. After adding the primer, the DNA must be amplified through polymerase chain reaction (PCR), a technique that artificially induces DNA replication. The DNA is then sequenced using high-tech machines that sequence each fragment and use overlapping segments to assemble a complete file (2). Take, for example, the phrases “four score and seven years ago our,” “ago our fathers brought forth on this,” and “on this continent a new nation.” Using the overlapping sections, the correct sequence would be “Four score and seven years ago our fathers brought forth on this continent a new nation…” DNA sequencing uses the same principles, and simply occurs on a much larger scale using overlapping nucleotide sequences rather than words.
Besides its massive storage capacity, DNA can preserve data for centuries, making it far more stable that current devices, which last for only a few decades before metal components wear out. DNA storage has significant applications for high-tech companies lacking digital storage, and scientists hope to develop the technology further to reduce cost and increase efficiency so the technique can be used commercially (1).
1.University of Washington. (2016, April 7). Scientists store digital images in DNA, and retrieves them perfectly. ScienceDaily. Retrieved April 8, 2016 from www.sciencedaily.com/releases/2016/04/160407121455.htm
2. Bornholt, J., Lopez, R., Carmean, D. M., Ceze, L., Seelig, G., & Strauss, K. (2016). A DNA-Based Archival Storage System. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems – ASPLOS ’16 (pp. 637–649). New York, New York, USA: ACM Press. http://doi.org/10.1145/2872362.2872397