What Does The Future of Data Look Like?

Haihui Joy Jiang
3 min readJul 16, 2020

The long term management of data has only become a more complex problem over time, as has the question of how it can or should be used. Data infrastructure has also faced the challenge of having to transition between formats, with the switch to digital requiring the extensive transcription of written records. COVID-19 has accelerated the development of digital data storage, with industries forced to reconsider existing archives to better allow for remote work.

The exponential growth of data, which has come to be known as Moore’s Law, has continually influenced its potential uses. The mass proliferation and collection of data is commonly known as big data. The challenge with big data is sifting through it to find valuable information, as so much data is collected to the point where the majority is considered junk. Worse still, it can be difficult to filter the good from the bad. This has led to a need for data scientists to develop the means to interpret increasingly massive sets of data to leverage into business, legal, or scientific insights.

The data abundance has brought with it additional challenges in the form of security and privacy concerns. As records in industries such as education, healthcare, finance, and business go digital, they become increasingly vulnerable to cyberattack. Attacks such as the 2017 Equifax breach can lead to the compromise of thousands of private records, resulting in further theft and tarnishing the reputation of the company in question. Breaches as well as the collection and sale of data by corporations call into question how much a private citizen can really stay private in the age of big data.

As a result, the viability of digital archives has been called into question. Even the overall shift from physical to digital storage has resulted in the loss of untold amounts of data in the transition. Additionally, the mutable nature of digital storage allows for the alteration of documents and information in a manner undetectable by most. This can undermine research or investigative efforts and result in the spread of disinformation to the general public. There are, of course, best practices when it comes to archiving, including the notation of any edits to materials. However, not everything can be managed by a professional archivist. Managing data feels like a lawless frontier, where digital records can be too extensive to effectively manage without issue.

In the future, data storage may even move away from digital infrastructure, which can be hacked, while still averting the space requirements for traditional archives. Our lab at Harvard has released findings on a technique that we see as a potential supplement to existing data storage methods. Our method involves encoding information on molecules, storing data in a manner that is resilient and space-efficient. Our approach of using small molecules (e.g. oligopeptides) is intended to be cheaper and less labor-intensive than DNA data storage. The system that the team designed is capable of using other types of molecule as well, as long as they can be manipulated into distinguishable bits. This method is expected to become more efficient with faster technology. Data storage techniques such as these can find a home to help secure and archive sensitive data in a method that can’t be hacked and requires little upkeep on the part of archivists.

Originally published at https://haihuijoyjiang.co.

--

--

Haihui Joy Jiang

Haihui Joy Jiang is a Postdoctoral Fellow at Harvard University. For more, be sure to visit haihuijoyjiang.co online for the latest insights and updates!