Molecular data storage is an idea that somehow seems very far away while also existing right in front of our eyes. All living things encode information using chemicals, achieving incredible sophistication, information density, low cost, longevity, and energy efficiency. In theory, if we could encode our digital data in molecular form, we could tap into some of this potential. But how? And to add to the challenge, can we do it without DNA?
In the paper “Multicomponent Molecular Memory” we demonstrate one possible solution to storing large amounts of molecular data. Instead of putting information into sequences of DNA bases or amino acids, we encoded data into mixtures of small and diverse synthetic molecules. We write the data by deciding which combinations of molecules to include in each mixture. If we use enough different molecules, these data mixtures have an interesting property that many bits of data can be combined at one location, since the molecules representing each bit can be separated later by their chemical properties.
The molecules for these datasets are synthesized with Ugi reactions, which merge four chemical components into one product. By mixing a few types of each component, we can create thousands of unique molecules from only a handful of starting materials. We then write data by using a robot to spray tiny droplets of these Ugi molecules into hundreds of unique mixtures. Later, we read the data using a mass spectrometer, which separates the molecules in a mixture by their masses. To boost the accuracy, we pair our readout with some common machine learning tools to decode the mass spectra. Using this system we wrote and read digital images of several pieces of art, including a Cubist drawing by Picasso.
In theory, there are more unique small molecules than there are stars in the universe. If we can continue to find ways to use more of these molecular structures in our datasets, we can represent more information without adding any additional mass.
We are a group of engineers, chemists, and theorists at Brown University who have been spending time thinking about the intersection between information and chemistry, and the tools, data, and experiments that bring them together. This is the sort of project that has a little bit of everything, and it wouldn’t have happened without a great collaborative atmosphere. It started in 2017 with brainstorming meetings that brought together Brenda Rubenstein (theoretical chemistry), Jacob Rosenstein (engineering), Chris Rose (information theory), Sherief Reda (computer engineering), Peter Weber (analytical chemistry), Eunsuk Kim (synthetic chemistry), Jason Sello (pharmaceutical chemistry), and Joe Geiser (analytical chemistry). Chris Arcadia and Eamonn Kennedy brought the workflows together and made the experimental details and analysis a reality; Joe Geiser made some fantastic mass spectrometry look easy; Amanda Dombroski, Kady Oakley, and Shui-Ling Chen worked on the Ugi libraries; Leonard Sprague helped with early experiments; and Mustafa Ozmen helped with data analysis, We owe big thanks to Edel Minogue and Brown OVPR for kicking off the first discussions; to Anne Fischer and DARPA for supporting the project; to Jill Pipher for helping to get the project off the ground; and to the entire School of Engineering and Department of Chemistry at Brown.
Read more here:
“Multicomponent Molecular Memory,” Nature Communications (2020). doi: 10.1038/s41467-020-14455-1.
You might also be interested in some of our other recent work on related topics:
Kennedy, E., Arcadia, C.E., Geiser, J., Weber, P.M., Rose, C., Rubenstein, B.M. and Rosenstein, J.K., 2019. Encoding information in synthetic metabolomes. PloS one, 14(7). https://doi.org/10.1371/journal.pone.0217364
Arcadia, C.E., Tann, H., Dombroski, A., Ferguson, K., Chen, S.L., Kim, E., Rose, C., Rubenstein, B.M., Reda, S. and Rosenstein, J.K., 2018, November. Parallelized linear classification with volumetric chemical perceptrons. In 2018 IEEE International Conference on Rebooting Computing (ICRC) (pp. 1-9). https://arxiv.org/abs/1810.05214
Rosenstein, J.K., Rose, C., Reda, S., Weber, P.M., Kim, E., Sello, J., Geiser, J., Kennedy, E., Arcadia, C., Dombroski, A., Oakley, K., Chen, S.L., Tann, H., and Rubenstein, B.M., 2019. Principles of Information Storage in Small-Molecule Mixtures. arXiv preprint arXiv:1905.02187. https://arxiv.org/abs/1905.02187