DNA Data Storage: Nature Does It Better!


 “Nature does it better” is an aphorism that scientists and engineers have confronted in its various embodiments for years. From the invention of Velcro as inspired by the grabbing mechanism of burrs to the use of aquaporin proteins to desalinate seawater, we have long been finding solutions to a wide variety of problems tucked away in biological artifacts and phenomena.  Now, the biotech community is recognizing that nature does one more thing better than human innovation has thus far been able to: data storage.


Today, the predominant method of storing digital data takes the form of binary code, strings of 1s and 0s expressing values that can be interpreted by our computers.  Cells, on the other hand, store their data in strings of As, Ts, Gs, and Cs, which represent the quaternary code of DNA.  These two systems are similar enough that, with a little bit of ciphering—for example, making 00, 01, 10, and 11 translate to A, T, G, and C respectively—scientists are now accessing an entirely novel physical storage medium unlike any that we’ve ever harnessed before.


Although the encoding of digital data in the form of DNA polymers has only been seriously undertaken in very recent years, the idea is not a new one.  Mikhail S. Neiman, a soviet scientist and radio officer, is credited with first proposing this radically different approach to data storage in 1964, just eleven years after the structure of DNA was cracked.  Even then, Neiman contended that DNA was more durable and reliable than present methods of data storage—in those days, usually magnetic tape—and cited the molecule’s critical role in heredity as evidence of those qualities.


Fast forward half a century and scientists are now affirming Neiman’s beliefs that DNA is, in many ways, a superior information storage medium compared to magnetic tape and hard drives.  It has a high data density, is stable for thousands of years under low light, temperature, and moisture conditions, and represents something of a universal language for all life on earth.


These characteristics have sparked some wild imaginations of a future where storing data in DNA is the norm. One much-discussed vision is that of a DNA archive vault containing the entirety of human knowledge, hidden away in a dark, cold, dry place for post-apocalyptic generations to uncover. Others predict that the extraordinary storage capacity of DNA will enable an Orwellian existence—a world full of cameras that are continuously recording and permanently storing an account of the population’s every move. Still others envision a future where DNA fragments encoding messages will be injected or engineered into people’s cells, with the human body serving as a conduit for transmitting information (think Craig Venter’s synthetic bacteria and its genetic watermarks).


While these possibilities may sound fantastical, they are technically feasible and inching closer to reality all the time. It’s now been shown that a piece of DNA the length of an average human gene can encode an incredible 1-2 megabytes of data. Teams around the world have developed multiple DNA-binary ciphering systems varying in their complexity, and are beginning to explore encryption methods and built-in redundancy as features of their coded data.  As of today, scientists have translated everything from photographs to music videos to the complete works of Shakespeare into nature’s incredible blueprint polymer. 


There’s just one thing standing in the way of storing our precious digital data in As, Ts, Gs, and Cs: the difficulty of reading and, most of all, writing them.


In 2016 the European Bioinformatics Institute estimated the cost of sequence-reading one megabyte worth of DNA data to be $220—a steep figure, but not compared to the cost estimate of synthesizing the same amount of DNA, which was placed at $12,400. That price tag lands DNA data storage squarely in the frustrating camp of technically feasibly but financially impractical to pull off.


Does this mean that DNA data storage a lost cause? Probably not, and the solution may lie in once again returning to that steadfast aphorism of “nature does it better.” That is to say, if we’re going to store information the way nature does, we might need a little more of nature’s help encoding and decoding it.

After all, the long-standing preferred approach to sequencing (the Sanger method) is enzyme-driven, as are many of the next-generation methods that have led sequencing technology to progress at a speed exceeding Moore’s law between 2004 and 2010. On the other hand, the predominant method of making DNA is still chemical synthesis, in one form or another. And although automation of chemical synthesis has brought the technology a long way, compared to sequencing, DNA synthesis remains stuck in the dark ages.


But perhaps what DNA synthesis needs is to be a little more inspired—by nature, that is. That wish may soon be granted, as there’s at least one research team working on developing a novel enzyme-based method of synthesizing DNA. It’s an approach that, if successful, could send DNA synthesis down the path of Moore’s law, like its sequencing counterpart before it.

The question is not if, but when DNA will become a practical storage medium—and for that, it’s too early to tell.  But for now we can simply dream about the boundless possibilities of a future where one of nature’s most brilliant innovations reinvents and re-enables many of our own.



By Christine Stevenson