Artificial Intelligence helped with the Coronavirus (Covid-19) vaccine. My thoughts, research, and approach to benefit & accelerate the search for Coronavirus vaccine using artificial intelligence.
Artificial intelligence has had its share of debates where we have questioned how can it be used for good and bad. There is a lot of fear and hope as a result of these discussions and debates. What we need right now is a ray of hope in these dark times when we are fighting a pandemic. It is time that AI lives up to its hype and helps the scientific community in the search of a vaccine. Artificial Intelligence helped with the Coronavirus (Covid-19) vaccine, I write this piece with a lot of responsibility and would like to share my research and thoughts on this topic through this article.
I am very fortunate to have an interest in AI and the opportunity to work with AI. I think AI is playing three strategic roles in this quest of helping scientist fight Coronavirus —
- Running complex algorithms by parsing diverse datasets and identifying components of a vaccine by understanding the viral protein structure of COVID -19.
- By helping medical researchers parse through tons of relevant research papers at an unprecedented pace.
- Identifying compounds using AI and cloud computing to prevent the Spike protein from binding to the ACE2 receptor on human cells.
A lot of companies/institutes have created AI tools, shared data sets, and research results, and shared them freely with the global scientific community.
According to the National Institute of Allergy and Infectious Diseases, There are three types of vaccines —
- Whole-Pathogen Vaccines
- Subunit Vaccines
- Nucleic Acid Vaccines
The types of vaccines the scientific community is interested in, are the subunit vaccine and nucleic acid vaccine type. These types of vaccines inject genetic material of the pathogen into human cells to stimulate an immune response. The latter is the type of vaccine targeting the virus, that began trials this week in the United States. AI is useful in accelerating the development of subunit and nucleic acid vaccines.
Proteins are an essential part of viruses and are made up of a sequence of amino acids that determine their unique 3D structure. Once we understand the structure of the protein, scientists can develop response based drugs that work with the protein’s unique structure. However, it would be impossible to examine all possible shapes of a protein before finding its unique 3D structure. AI can expedite this process by a million fold and helps us identify compounds that can vector in the unique protein structure.
There has been extensive work done on this using AI by a lot of organizations and institutions. In January, Google DeepMind introduced AlphaFold, a cutting-edge system that predicts the 3D structure of a protein-based on its genetic sequence. In early March, the system was put to the test on COVID-19. DeepMind released protein structure predictions of several under-studied proteins associated with SARS-CoV-2, the virus that causes COVID-19, to help the research community better understand the virus.
The University of Texas at Austin and the National Institutes of Health used a popular biology technique to create the first 3D atomic-scale map of the part of the virus that attaches to and infects human cells — the spike protein.
This is a 3D atomic-scale map, or molecular structure, of the 2019-nCoV spike protein. The protein takes on two different shapes, called conformations — one before it infects a host cell, and another during infection. This structure represents the protein before it infects a cell, called the prefusion conformation. Credit: Jason McLellan/Univ. of Texas at Austin
University of Washington’s Institute for Protein Design also used computer models to develop 3D atomic-scale models of the SARS-CoV-2 spike protein that closely match those discovered in the UT Austin lab.
It is crucial to be on top of the scientific research on COVID-19 but it requires a lot of effort to keep up with the results and collate them at one common platform. This can really help the scientific community by sharing critical pieces of information that will save them a lot of time and effort. Labs report their work via published articles and increasingly via preprint services like bioRxiv (COVID-19 Work) and medRxiv (COVID-19 Work).
As new research keeps getting published on a daily, sometimes hourly basis in this critical time. it becomes increasingly difficult for the scientists to be on top of the research, connect the dots, and uncover insights.
In this pursuit, Allen Institute for AI has partnered with several research organizations to produce the COVID-19 Open Research Dataset (CORD-19), a unique resource of over forty thousand plus scholarly articles about COVID-19, SARS-CoV-2, and related coronaviruses. It is updated daily as new research is published. This freely available data set is machine-readable, so researchers can create and apply natural-language processing algorithms, and hopefully accelerate the discovery of a vaccine.
AI has played a vital role in the COVID-19 outbreak apart from just research.
- AI startup Bluedot detected a cluster of unusual pneumonia cases in Wuhan in late December and accurately predicted where the virus might spread.
- Robots have been reducing human interaction by disinfecting hospital rooms, moving food and supplies, and delivering telehealth consultations. AI is being used to track and map the spread of infection in real-time, diagnose infections, predict mortality risk, and more.
- AI and FLIR enabled helmets have been used to monitor the temperature in large scale and regulate, isolate, and transfer patients who have shown symptoms of COVID-19.
- AI enabled models that predict the rate of spread of this infection and help us flatten the curve with better targeting.
- AI enabled models that predict the mutations of the virus that can help our scientists work with the vaccine to counter the mutations.
- Robots that help healthcare workers with limiting the exposure to the patients.
- Doctors using AI to triage COVID-19 patients (More info here).
- AI is being used to fight misinformation about COVID-19.
There is a slight problem though, modern AI methods require large amounts of labeled data to be effective, and that data isn’t currently available. Even when data is available, human judgment is essential to carefully analyze AI’s pattern recognition. This is something we need to keep in mind while using AI as a tool on such large scale unvetted data.
Here is my humble attempt to accelerate the search for the COVID-19 vaccine using AI.
While we have some issues with the way we are collating data with regards to what is happening in real-time, we have some solid input based on the 3d structure of the protein that forms the outer layer of this virus. My approach to identifying a series of compounds used in drugs that are already available in the market.
My approach towards the problem from a technical standpoint encompasses the following steps.
- Identify possibilities of the virus outer structure.
- Identify vector possibilities from the diverse set of structures of the lipid spike protein.
- Go through the list of available vaccines/medicines, parse them to identify chemical compounds.
- Identify compounds that prevent the spike protein to bind with the ACE2 receptors of the human cells.
The following algorithms have been used during the various stages of the research.
- Cluster analysis
- K-means clustering
- Decision tree/forest
- Statistical classification
My approach was to identify and use the rules of deduction so we can focus on the compounds that are probable. This would mean the positives will be limited and we can focus on them. The first step in this process was to do some cluster analysis on the compounds and identify positives. The next step was to build a strong decision tree / decision forest. Once the criteria was established, I ran the compounds through the decision forest. I was able to tweak the decision forest based on results which helped in streamlining compounds further. Once the positives were identified, I wanted to identify the nearest compounds to the positive compounds. This approach helped me identify 34 compounds using AI that may prevent the spike protein from binding to the AEC2 receptors in the human body. Using these compounds may help stop the replication of the COVID-19 virus within the human body.
The objective of cluster analysis is to assign observations to groups “clusters” so that observations within each group are similar to one another with respect to variables or attributes of interest, and the groups themselves stand apart from one another. In other words, the objective is to divide the observations into homogeneous and distinct groups.
In contrast to the classification problem where each observation is known to belong to one of a number of groups and the objective is to predict the group to which a new observation belongs, cluster analysis seeks to discover the number and composition of the groups.
This helps in identifying large distinct groups through the research and identify buckets of data that can be filtered through the K-means clustering.
Once the cluster analysis is done, we will have to identify the k for better classification of the cluster subset. The K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster while keeping the centroids as small as possible.
The ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid.
Once we have the relevant clustering, based on the cluster analysis and k-means clustering, we need to come up with a definitive decision tree to weed out the false positives and negative data clusters. I had to keep evolving the decision tree based on parameters that kept being discovered through the data set.
Once we had the data of chemical compounds parse through the decision tree, and a solid group of compounds was identified that may help in preventing the spike protein to bind to the AEC2 receptors. I ran the K-NN to identify closet compounds to the list I had to increase the count for experimentation.
Conclusion – Artificial Intelligence helped with the Coronavirus (Covid-19) vaccine.
Artificial Intelligence helped with the Coronavirus (Covid-19) vaccine, and based on my approach and process I was able to identify 34 compounds with the help of AI that may prevent the spike protein from binding to the AEC2 receptors in the human body. Using these compounds may help stop the replication of the COVID-19 virus within the human body.
I have just identified the compounds with the help of AI, this is just one part of a probable solution. Whether this will work or not is a different story. This will require extensive testing in the labs by scientists. If successful, the advantage would be that these are already vetted and tested compounds and can be fast-tracked.
I am trying to be socially responsible and not publishing my research until it is peer-reviewed and vetted by scientists who are working on identifying a vaccine. My research once peer-reviewed will be an open-source project for everyone to learn and contribute to this fight.
Until then please stay home, stay safe, wear a mask and trust our scientists to do their job.