Scientists from the Data Science Institute at Columbia University and the New York Genome Center (NYGC) published new research this week detailing a new data storage technique that leverages DNA molecules to store digital information.
Deoxyribonucleic acid, commonly known by its acronym of DNA, is the molecule around which all life revolves. In nature, DNA works by storing information about different forms of life and its characteristics using four nucleotides: A, G, C and T.
In essence, DNA works just like your hard drive, but instead of binary ones and zeros to store digital data, it uses a quaternary base to store information about a living organism’s genes.
Researchers embedded six files inside DNA molecules
Previous research has shown that DNA can be created from scratch using a technique called DNA sequencing, in which scientists put together the DNA gene sequence they desire.
Previous research also showed that DNA could be used to store binary information. What Columbia scientists did during their research was to refine the technique that converted digital data into molecular sequences and optimized DNA’s storage capacity.
During their experiment, researchers said they successfully stored six files inside DNA molecules:
- a full computer operating system (KolibriOS)
- a computer virus
- a $50 Amazon gift card
- a Pioneer plaque
- a 1895 French film – “Arrival of a train at La Ciotat”
- a 1948 study by information theorist Claude Shannon
Researchers converted binary code to DNA sequences
According to their researcher paper, published last week in Science Magazine, researchers took these six files and compressed them in an archive.
They then used an algorithm called fountain code [1, 2] to randomly package binary data strings into “droplets,” and then map the binary code of each droplet to the four DNA nucleotides.
The fountain code erasure-correcting algorithm was configured to delete (A, G, C, and T) letter combinations known to create errors, while also adding a barcode to each droplet. The barcode served, later on, to reconstruct the binary code into the correct order.
Researchers create their own custom data-storing molecules
At the end, the six archived files were converted to 72,000 DNA strands, each one consisting of 200 DNA base pairs. At this point, this data was still in a digital format, stored in a text file.
This text file was then sent to a DNA sequencing laboratory in San Francisco, who sequenced the DNA strands into actual biological DNA molecules and stored them inside a simple vial, which they sent back to Columbia scientists.
The research team then used special software coded in Python (available on GitHub) to read the DNA molecules and reassemble the data on their computer. A video is available below, showing one of the researchers powering up the operating system retrieved from the DNA molecules and then playing Minecraft.
The actual data retrieved from the DNA molecules is also available for download. “For obvious reasons, we removed the Amazon gift card. To reveal the card, please decode the data!” researchers said, launching a challenge for fellow scientists.
One gram of DNA can store 215 petabytes of data
Currently, this method for storing information is only taking its first steps. The biggest disadvantages are time and money.
It takes about two weeks to synthesize the DNA sequence, while it costs $7,000 to sequence 2MB of data into DNA, and then another $2,000 to read it.
Despite this, the research team is very optimistic. For example, one single gram of DNA can store a whopping 215 petabytes of data. That’s about the size of a medium data center compacted into a grain of sand.
Further, DNA has a much longer life compared to current data storage mediums that tend to lose information after a few decades. If frozen, DNA carrying digital data can survive hundreds of thousands of years, just like the DNA of ancient humans discovered buried underground or in caves.
Researchers achieve 89% of DNA’s data storage capacity
Additionally, researchers said their method has almost reached DNA’s full data storage capacity. Other researchers discovered that technically you could only store a maximum of 1.8 bits of data per DNA nucleotide base.
Previous research achieved 1.0 bit storage capacity per DNA nucleotide base, while the Columbia team reached 1.6 bits.
Through time and further research scientists hope to maximize their method’s data storage capacity for DNA nucleotide bases, and also reduce the cost of writing and reading data from DNA. More details areavailable via the project’s website. Below is a video put together by the research team that summarizes their recent work.