DNA Data Storage: The Storage Medium of the Future

By: James Green
Tag(s): Data
Published: Nov 13, 2020
DNA Data Storage: The Storage Medium of the Future

World first: A Netflix series successfully stored in DNA


dna data storage

The convergence of technology with biological sciences is not a new phenomenon; there have been numerous examples in recent history where technology has played a pivotal role in driving fundamental breakthroughs in biological science.

Three notable advancements come to mind:

The Neural Link

The brainchild of Tesla founder, Elon Musk, the Neural Link is a biomedical device that is inserted into the human skull and can communicate with the brain.

"The initial goal of our technology will be to help people with paralysis to regain independence through the control of computers and mobile devices."

If you’ve not seen it, the live demonstration with a NeuralLink device inserted into a pig (they assure viewers how important they take the welfare of these animals), is a fascinating, if not slightly unnerving watch.

Magnetic Resonance Imaging (MRI)

Developed in the late 1970s, MRI technology revolutionised internal imagery of the human body. In recent years, advancements in image processing software have pushed MRI to new limits, including the ability to image lung function (previously not possible with MRI), produce higher resolution images, and reduce scan times.

Modelling protein folding

DeepMind, the AI company owned by Google, was made famous for becoming the first company to develop AI capable of beating human champions in the ancient board game, Go.

Now, an arm of DeepMind code-named AlphaFold, is developing groundbreaking AI to tackle the complex problem of simulating protein folding.

"Scientists have long been interested in determining the structures of proteins because a protein’s form is thought to dictate its function. Once a protein’s shape is understood, its role within the cell can be guessed at, and scientists can develop drugs that work with the protein’s unique shape."

By understanding protein folding, researchers can understand complex biological systems including cells, tissues, and organisms.

Now it seems, it’s payback time

Investors, futurists, technologists take note: there’s a new emergence taking place, that I think, marks a fundamental shift in this partnership:

"For the first time in history, biological science is helping technology itself to advance."

How? In a somewhat unexpected area; data storage.

Modern data analytics platform - 7 rules for success

What is DNA data storage?

DNA data storage refers to the process of storing data and files — including audio, video and text files — on synthesized strands of DNA (deoxyribonucleic acid).

How does DNA data storage work?

The data is converted from traditional binary digits (1s and 0s) to the letters A, C, G, and T. These letters represent the four main compounds found in DNA:

  • Adenine
  • Cytosine
  • Guanine
  • Thymine

The letters can then be decoded back to binary digits to recover the data from the DNA molecule.

Biological data storage using DNA

Catchily referred to as the “DNA-of-Things (DoT), this rapidly growing field of technology uses sequences of DNA to persist data. DNA has a potential storage density orders of magnitude larger than current mainstream storage mediums, such as Solid State Drives (SSDs).

Netflix in your pocket

One Megabyte (1MB) is a volume of data you should be used to seeing in your everyday files on your computer.

One Petabyte (1PB) is one billion Megabytes.

To put this into some context, it’s estimated that the entire Netflix catalogue is currently about 90 Petabytes.

Now, imagine if you could carry that Netflix catalogue around with you, say installed locally on your Smartphone. DoT is making that a reality.

As this Havard paper points out, the theoretical maximum data storage of DNA is 455 Exabytes per gram.

One Exabyte is a thousand Petabytes. So, theoretically:

1gram of DNA storage could store 5,055 Netflixes.


Storing a Netflix Season in DNA

It might not be the entire Netflix back catalogue, but this tweet from August this year, marked a key milestone in the evolution of DNA storage:

That’s right, an episode from the Netflix season “Biohackers”, was stored, and recalled, from a sequence of DNA strands.

Why biological storage is the better option

In short, it comes down to two factors: storage density and durability.

From this ETH official press release, Dr. Emily M. Leproust, Ph.D (CEO and co-founder of Twist Biosciences) explains:

“DNA is an incredible molecule that, by its very nature, provides ultra high density storage for thousands of years. In fact, the DNA contained within all cells in a human body could store all the movies created to date in the 21st century three billion times over. That, indeed, illustrates the magic of bringing biology and technology together to create synthetic (inert) DNA.”

An insight into the future (well, from 2002 to now)

To further help the reader understand what I mean by density, let me recount a memory from a lecture during my Computer Science degree, back in 2002. It went something like this:

“Data storage of today, typically a physical hard drive, uses zeros and ones stored on magnetic discs within the drive. One area of important research is biological storage; that is, using biological material to persist data.

Researchers estimate that 1Kg of this biological storage medium is enough to store all information ever written, spoken, or otherwise recorded by Humankind.”

Storage prediction, or storage fiction?

How close to reality was my university lecturer?

Well, we have already explored the theoretical maximum DNA storage capacity; 455 Exabytes per gram.

1Kg of DNA could therefore store 455,000 Exabytes, equivalent to:

455 Zettabytes per Kg

To put that number into perspective, Seagate Technology, one of the world's largest storage manufacturers, predicts:

“The global datasphere will grow from 45 zettabytes in 2019 to 175 by 2025

Wow, perhaps my university lecturer was right after all.

The durability of DNA Storage

Along with density, a second differentiator as a storage medium is its durability. I must admit, I was quite surprised to learn this as I imagined DNA to be, I suppose, “fragile”. But if you think about it, scientists have been able to extract intact DNA from fossils, dating back millions of years. I can’t imagine the same being true for an SSD drive buried in the sea bed.

Another thing to note is modern SSD drives have only a limited lifespan of 10 years or less; in fact, a 6 year study of Google data centers by the University of Toronto, discovered SSD drives have a lifespan that is actually even less than this.

Investors in DNA Storage

It probably comes as no surprise, but Microsoft is heavily backing this emerging technology. Whilst their technology appears to be slightly behind ETH (based on the volume of data successfully stored and recalled), they are continuing to invest in its development as this press release shows.

Microsoft press release, March 21, 2019

"Researchers from Microsoft and the University of Washington have demonstrated the first fully automated system to store and retrieve data in manufactured DNA — a key step in moving the technology out of the research lab and into commercial datacenters.

DNA can store digital information in a space that is orders of magnitude smaller than datacenters use today. It’s one promising solution for storing the exploding amount of data the world generates each day, from business records and cute animal videos to medical scans and images from outer space."

In this press release, they also try and quantify the potential of this storage medium:

"[DNA storage]..could fit all the information currently stored in a warehouse-sized datacenter into a space roughly the size of a few board game dice."

What’s next for DoT?

Research continues at pace. This research paper from North Carolina State University, in June 2020, already outlines a

“..fundamentally new approach to DNA data storage systems, giving users the ability to read or modify data files without destroying them and making the systems easier to scale up for practical use.”

There are, however, a few hurdles researchers still need to break through. One is a limitation on the cost of read/write operations: about $3500 per MB of data writing, and $1000 to read it. Another is that these read/write operations are slower than conventional media (sourced from DNA as a digital information storage device: hope or hype?).

How long until this technology becomes mainstream is yet to be seen. But at the rate that our data production is growing, it had better be sooner rather than later.

Looking for more information on how to store, process and access data more reliably and easily? Download our free whitepaper, The modern data analytics platform - 7 rules for success:

Modern data analytics platform - 7 rules for success

Next steps

1. Learn more about Ancoris Data, Analytics & AI


< Back to resources

Think big. Start now.

We don’t believe in Innovation, we live it. Innovation combined with pragmatism is what runs through our veins. We ask ourselves the same question over and over again: Does it deliver value? And how quickly? Your big ambitions can start now.