2020 saw the industrialisation of artificial intelligence (AI). Years of fine-tuning machine learning (ML) models, the greater availability of very specific datasets and the astonishing performance improvements of AI-dedicated hardware has allowed AI to move well beyond the R&D stage and start to become integrated into core workflows. While still in its infancy, AI disruption has already been felt acutely in a number of companies across multiple sectors.

It was 25 years ago that Deep Blue, a computer built by IBM, beat Garry Kasparov at chess. Since then, technology has advanced to the extent that massive deep-learning models have been developed so today natural language processing (NLP) is a key battleground. Its profound implications include the ability to hold an AI-driven conversation, answer questions and understand documents. To illustrate this, after months of training, in February last year Microsoft demonstrated its transformer-based Turing Natural Language Generation (T-NLG) model had surpassed human performance. It features 17 billion parameters, more than double the previous most sophisticated NLP model.

The breakthrough of this new type of large AI model that can generalise the training process marks the beginning of the next generation of AI.

Open AI upgraded its transformer-based model in June 2020 (GPT-3), which now features 175 billion parameters. This model brings a revolutionary change in being able to mimic how humans can perform new language tasks from very few samples. Previous NLP models, including Microsoft’s T-NLG, require pre-training with large amounts of data to fine tune performance on a specific task. GPT-3, if successfully implemented, will likely drastically expand the range of applications as it is designed to be task-agnostic with limited data sets. Google pushed the boundary further by introducing an mT5-Base model with an outrageous 1.57 trillion parameters. The breakthrough of this new type of large AI model that can generalise the training process marks the beginning of the next generation of AI.

The technology giants have been incorporating these new models into their own products thanks to their data advantage. Bing, the search service offered by Microsoft, is now powered by a T-NLG model (currently for US users only) with accuracy said to have improved by up to 125%. Even though Microsoft also offer pre-trained, deep-learning models to the public freely (hoping to generate revenue through cloud services, ie compute and data storage), the hurdle of having high quality domain data has been the gating factor for wider adoption. As a result, synthetic data has become a real area of focus recently. This type of data is randomly generated by algorithms to preserve certain statistical information and the relationships between variables in the original data sets. Even considering the complexities of synthetic data generation, it has a significant cost advantage compared to real data sets generated by actual events. In addition, synthetic data can also simulate conditions which are rare, expensive or dangerous to recreate in real life, whilst also addressing privacy concerns around data such as healthcare-related machine learning, which has been a growing reason to drive fast adoption of synthetic data in the training processes.

AI has become the most powerful technology to improve experimental accuracy and perform molecular simulation to accelerate the process of drug discovery.

The rapid industrialisation of AI has been particularly pronounced within the healthcare industry. Drug discovery, a $1.25trn industry, is the largest expenditure in drug development. On average, a new drug requires $2bn of R&D and 12 years to develop but the failure rate stands at 90%. AI has become the most powerful technology to improve experimental accuracy and perform molecular simulation to accelerate the process of drug discovery. AI-infused drug discovery is projected to become a $40bn market by 2027. DeepMind, a unit of Alphabet, built an AI-based system – AlphaFold – that solved the protein-folding problem, a 50-year-old grand challenge in biology which was raised by Christian Anfinsen in his acceptance speech for the Nobel Prize for Chemistry in 1972. This revolutionary system, equipped with the most advanced ML techniques and decades of prior research of genomic datasets, is programmed to computationally predict the 3D structure of a protein based solely on its 1D amino acid sequence, which largely determines a protein’s function. Extensive research activities have been carried out to determine protein structures, all of which have been based on years of trial and error with the help of expensive specialised equipment due to the outrageously high number of possible configurations of a typical protein. Cyrus Levinthal famously created the so-called Levinthal’s paradox in 1969 – that proteins fold spontaneously within milliseconds in nature but it would take longer than the age of the unknown universe for researchers to enumerate all possible configurations of a typical protein, estimated to be 10,300 possible configurations. AI enabled capabilities built off such recent developments can fundamentally change drug discovery.

The virtual screening (docking) technique has also been dramatically improved by embedding deep learning, which now can process billions of molecular structures in a rapid and accurate fashion. This deep-docking approach realises up to 100-fold data reduction and 6,000-fold data enrichment for candidate drug molecules. It was used to accelerate virtual screening of a 1.3 billion compound library to help identify 1,000 quality candidate compounds to inhibit the SARS-CoV-2 [COVID-19] main protease. The process was completed in one week, compared to three years with previous programs.

All these developments further demonstrate the potential of AI to fundamentally change the hit-or-miss business model for drug developers, which brings the hope of acceleration in developing drugs for rare diseases that have not been adequately addressed due to high development cost and low return.

AI funding in the private market declined for the first time, to $25bn, in 2020 because of the pandemic and economic uncertainty, even with a modest recovery in activity during H2. Only 30% of the funding went into late-stage companies as AI innovation is still mostly early stage. The number of transactions in the space remained elevated with a total of 89 exit events (a new record) including several high-profile IPOs.

We are excited about the growing ability to capture such opportunities in public markets as more companies emerge targeting AI usage in non-traditional markets. However, we also continue to believe that the semiconductor and semiconductor equipment industries are an attractive, levered way to play AI proliferation. The amount of compute power used by machine learning doubles every 3.4 months. We believe semiconductor companies are likely to capture 40-50% of the total value from this technology stack, representing the best incremental opportunity for the industry in decades. By 2025, AI-related demand is expected to account for 20% of total demand. We expect Alphabet, Microsoft and Alibaba, with a clear lead in ML framework development, to potentially derive meaningful revenue from offering AI capability as a service monetised via cloud computing.