Over the past few months, it’s become clear that AI can be trained to imitate human language–just look at ChatGPT. Now, it has been shown that similar language models, if properly trained, can mimic evolution and human biology.
In a study, which was published on Thursday in Nature Biotechnology, researchers tested the ability of a language model (Salesforce’s ProGen) to generate amino acid sequences–enzymes–that could potentially work in real life scenarios. The project was a collaboration of many different parties, including Salesforce Research and researchers at University of California-San Francisco and University of California-Berkeley
But why use a language model–something that’s been used to generate essays and articles, for example–to generate biology? Proteins can be represented as a language made up of amino acids, the 20 molecules that make up every protein.
” “In much the same manner that text sentences are made by stringing together words, so are amino acids to create proteins,” Nikhil Naik, Director of AI Research at Salesforce Research wrote to Motherboard. “Building on this insight, we apply neural language modeling to proteins for generating realistic, yet novel protein sequences.”
Basically, instead of learning the language of English, the team developed AI to learn the language of proteins, explained Ali Madani Ph.D, a former scientist at Salesforce Research involved with the study wrote in an email to Motherboard.
Like other AI programs it had to be trained accordingly. ProGen began its training on 280 millions of proteins. After two weeks, the team fine tuned the model by introducing it to a dataset of about 56,000 proteins from five different families. One million synthetic sequences were generated by the model. The team focused on 100 proteins to see how they compared to natural proteins, and whether or not they had adequately followed the so-called “grammar” of amino acid composition.
Of those 100 proteins, the team created five of the artificial proteins and tested their functionality in cells, seeing how well they compared to an enzyme found in chicken eggs aptly named “hen egg white lysozyme” (HEWL). The activity of two proteins was similar to that found in HEWL. They were able to break down the cell walls and cells of bacteria.
” The enzymes function (outside-of-the box) and proteins that evolved over many millions of years. Madani stated. Also, the team found that the model could capture evolutionary patterns without being specifically trained.
While AI has been used to generate proteins, this study differs a bit from prior research and further expands the idea of what is possible with language models.
“Our work uses conditional language models that allow for significantly more control over what types of sequences are generated, making them more useful for designing proteins with specific properties,” Naik wrote. “We have also validated our results in a wet lab.”
The methods described in the paper are also available on GitHub to enable the research community to build on this work and accelerate research on AI for protein design. Madani believes proteins are essential for life.
“Everything that can go wrong or right in a human body is reliant on proteins, and so designing new ones can allow us to more effectively treat diseases or even avoid them in the first place,” Madani wrote.”We can use AI to engineer these solutions.
The post AI Replicated Evolution and Generated New Enzymes as Good as Natural Ones appeared first on VICE.