Apr 16, 2022 |
|
(Nanowerk News) Proteins are integral components of all living organisms. They are composed of a sequence of building blocks called amino acids. That sequence determines their function, which can range from setting the structure of cells to regulating metabolism.
|
Scientists can change a protein sequence and experimentally test if and how that change alters its function. However, there are too many possible amino acid sequence changes to test them all in the laboratory.
|
Instead, researchers build highly complex computational models that predict protein function based on their amino acid sequence. This is critical for engineering proteins with novel functions.
|
Scientists have now combined multiple machine learning approaches for building a simple predictive model that often works better than established, complex methods (Nature Biotechnology, “Learning protein fitness models from evolutionary and assay-labeled data”).
|
|
Scientists create a “probability density model” using evolutionary data from related proteins. They also experimentally measure functions of related protein variants to “augment” the model and predict how to modify a protein to improve its function. (Image: University of California, Berkeley)
|
Naturally occurring proteins serve many crucial functions in maintaining life. But scientists can also engineer natural proteins for desired purposes such as gene editing and the synthesis of valuable chemicals.
|
This new combined modeling approach to predict protein function will aid in the design and engineering of novel proteins. This approach will allow scientists to easily redesign proteins for a huge range of applications such as new enzymes to convert plant matter into biofuels or bioproducts or to create new biomaterials.
|
Scientists have several approaches to predict functional properties of a given protein that use the protein’s amino acid sequence to build a computational model. Scientists create such models employing both classical statistical methods and modern-day machine learning computational approaches.
|
One of those statistical methods, called regression analysis, associates a given amino acid sequence with an experimentally measured functional property of a protein. To increase the amount of data available to make functional predictions for a protein, researchers include sequences of evolutionarily-related proteins as additional input.
|
In general, those evolutionarily-related proteins are likely to share the property of the protein of interest, albeit often without direct experimental evidence. Researchers use a machine learning modeling approach based on the statistical properties of those sequences.
|
In the study highlighted here, researchers combined regression analysis and evolutionary data to propose a simple, effective machine learning approach. The researchers found that this simple combination approach is competitive with, and often outperforms, more sophisticated methods.
|