[IMAGE: GETTY IMAGES]
Artificial Intelligence and Machine Learning were two of 2017’s hottest technological buzzwords. It is not difficult to understand why the potential benefits of these technologies are exciting and profound. But Artificial Intelligence and Machine Learning both rely on other foundational technologies in order to achieve the results that they promise. Consequently, innovation within the realm of AI is constrained by the limitations of other technologies.
Access to high-quality, usable data is one factor that has significant implications for the development of AI. Even as AI is enjoying its moment in the spotlight, innovation within the realm of Big Data is becoming more crucial than ever for the continued development of AI technologies.
Data Integrity In The Third Wave of AI
The history of AI development can be divided into three distinct waves. First-wave AI was characterized by optimization and “knowledge engineering” programs, which took real-world problems and found efficient solutions. Second-wave AI was characterized by Machine Learning programs that automated pattern recognition based on statistical probabilities. We have now entered the third wave of AI: hypothesis generation programs, or “contextual normalisation.” Third-wave AI programs have the ability to examine huge data sets, identify statistical patterns, and create algorithms that explain the existence of the patterns.
In recent years, AI programs have taken significant leaps in their ability to analyse patterns in complex datasets and to generate novel insights even insights that escape human analysts. When IBM Watson took down a human competitor on Jeopardy, it did so with advanced natural language processing and a remarkable breadth of general knowledge.
Pharmaceutical companies such as Johnson & Johnson and Merck & Co. have begun to invest in similar third-wave AI technologies in order to gain an advantage over their rivals. The adoption of such technologies by pharmaceutical companies has led to significant discoveries, such as the link between Raynaud’s disease and fish oil.
It All Depends On The Data
It doesn’t matter how advanced Artificial Intelligence and Machine Learning algorithms become if they cannot access the data necessary to conduct analysis and generate insights.
Life science datasets are notoriously insufficient and difficult to work with, due to the remarkable depth, density, and diversity of biological data. Consequently, biological research has relied heavily on manually curated datasets that must be created and cleaned to test manually-conceived hypotheses. The work involved in this highly manual process has driven up research costs and the costs of biomedical products like vaccines and biotechnology. The time-consuming nature of this process has meant that by the time conclusions are published in academic journals; they may already be obsolete.
By creating and analysing biological data sets in this slow, inefficient, and error-prone way, researchers have inadvertently created a huge problem of publication bias and inaccuracy in medical science data.
Biased and flawed data sets were a problem for first- and second-wave AI programs, but third-wave AI software suffers most significantly from these limitations. Consider, for example, the issue of abbreviations and acronyms in medical terminology. One acronym often has various meanings “Ca”, depending on its context, can mean either “cancer” or “calcium.” Third-wave AI programs rely on complex contextual information in order to perform, and messy, manually-curated data sets reduce the effectiveness of these programs.
A Change In Data
2009’s HITECH Act ushered in the era of ubiquitous electronic medical record systems. As a result, rich datasets now exist that contain real-time, comprehensive biological information. These new datasets are joining with data from biological patents, clinical trials, legislative bodies, academic theses, and other sources within the innovation ecosystem to create complex pools of biological data.
This wealth of unstructured data has, until recently, only been useful to computing programs after a great degree of human effort to clean and organize the data. But now, Artificial Intelligence is sufficiently advanced to parse and analyse heterogeneous data using advanced algorithms that combine Machine Learning, natural language processing, and advanced text analytics. We’ve gone from a world of outdated, incomplete, and inaccessible data to a new paradigm in which AI can structure previously unstructured data for real-time analysis and context normalisation.
Third-wave Artificial Intelligence gives us clean, centralized data that reflects the complexity of biological systems. By parsing this data, we can achieve deep insights into the current biomedical landscape.