Progen: Ushering in a New Era in Protein Design

Taida Nando for Distilled Post

Researchers at Salesforce have developed an artificial intelligence program that may advance human evolution and make waves in the medical and industrial sectors.


What is ProGen?

Inspired by the success of deep learning language models (originally developed in 2020 to generate English language text) that showed promise in various biotechnological applications, including protein design and engineering, the scientists at Salesforce compared AI-created proteins with those made by nature throughout humanity's evolution and discovered that they could use the already existing language model to design proteins from scratch.

The AI program employs next-token prediction to construct artificial protein models from amino sequences. In the creation process of the model, scientists fed amino acid sequences of 280 million different proteins of all kinds into a machine-learning model that digested the information for a couple of weeks. After fine-tuning the model by printing it with 56,000 sequences of 5 lysozymes (antimicrobial enzyme) families, the scientist extracted five artificial proteins from the first batch of 100 proteins to test and compare their activity to an enzyme found in the whites of chicken eggs,(white lysozyme).

During the process of the screening comparison, the scientists stumbled upon a remarkable discovery. As a result of studying raw sequence data over weeks, the AI program simultaneously learned how to shape the enzymes in proteins. According to the scientists, 73% of proteins created by the AI program could function, as opposed to 59% of natural proteins.



Salesforce’s evolutionary goal

With the exploitation of a language model based on artificial intelligence, there are potentially unlimited opportunities.

For scientists, the path to gaining fluency in the language of proteins has been a difficult one. Unlike humans, AI programs can easily learn the language of proteins, bypassing the natural process of evolution. Through the use of an AI program that speaks the protein language, it would be possible to generate new functional proteins, advancing science and therapeutics, and creating efficient climate-resilient systems.

Artificial proteins and the climate crisis

The production of artificial enzymes and proteins could also play a key role in addressing climate change concerns. Many industries are confronted with production waste that continues to show how hazardous they are for our environment. Due to enzymes' full biodegradability, the production of artificially constructed enzymes in mass numbers could meet the needs of a modern society seeking more sustainable solutions. The application of these artificial enzymes to the biotech sector could facilitate the production of plastic-eating proteins that can improve the global waste management system.


The use of artificial proteins in the biotech sector may also result in a decrease in global emissions by replacing the addressable market for animal proteins with like-for-like alternatives.


Artificial proteins and drug discovery

 For those in the medical field, artificial protein production offers an array of possibilities.

With a program like ProGen that can accurately predict protein structures, drug discovery and vaccine development may greatly benefit in the future. Having an accurate prediction of protein storage will enable scientists to design drugs that can target specific proteins that cause infectious diseases.

If scientists and researchers can harness the full potential of enzymes, they may be able to maximise the use of Lysozyme. The bacterial enzyme is naturally produced by the human body. It can also be found in many food products. If the protein is manipulated by artificial intelligence programs, such as ProGen, it may be used for HIV treatment. This enzyme has been found to have activity against HIV, similar to RNase A and urinary RNase U, which selectively degrade HIV-associated RNA (infectious genetic material).

Alongside Salesforce’s discovery, an AI system called AlphaFold has been discovered to predict protein structures. The program will be able to predict the structure of these proteins, allowing scientists to design more effective drugs that will inhibit the target protein's activity, thereby treating a variety of diseases. Furthermore, AlphaFold's ability to accurately predict the structure of proteins that are difficult to study experimentally can help scientists discover new therapeutic targets and better understand diseases.


This could herald a new era in pharmaceuticals and therapeutics if properly utilised.


A glimpse into the future

Humans' work, learning, and interactions are being significantly transformed by artificial intelligence, and we may be entering a new era in which we can achieve astonishing feats by bypassing the natural process of evolution.


While Salesforce's experiment was not intended to create artificial proteins from scratch, ProGen offers researchers an opportunity to develop improved, next-generation proteins. A former Salesforce research assistant, Ali Madani, has already used the AI system's source code to start a company, ProFluent Bio, to further the research.