Prof. Yang and Lin’s Team Developed a Deep Learning-Guided Algorithm for Directed Protein Evolution

Recently, a research team led by Associate Professor Xiaofeng Yang from the School of Biology and Biological Engineering at South China University of Technology, in collaboration with Professor Zhanglin Lin’s team from the School of Biomedical and Pharmaceutical Sciences at Guangdong University of Technology, published a research article entitled “An iterative deep learning-guided algorithm for directed protein evolution” in iScience, a well-recognized journal in the Cell.
In this study, the authors introduced DeepDE, a robust iterative deep learning-guided algorithm for directed protein evolution. DeepDE integrates unsupervised, weak-positive only, and supervised learning methods. Notably, the algorithm requires only a modest supervised training dataset of 1,000 single or double mutants. In each round of evolution, the mutation radius was set to three. Although this radius presents a significant challenge, as it results in a combinatorial library of approximately 1.5×1010 variants, it also allowed the authors to utilize a standard mutagenesis kit for experimentally exploring a focused set of mutants based on the predicted triple mutation sites. As a proof-of concept, the authors selected the green fluorescent protein from Aequorea victoria (avGFP) as the model protein. Over four rounds of evolution, DeepDE achieved a remarkable 73-fold increase in activity, reaching nearly double the activity of the hallmark sfGFP, a widely recognized benchmark established through a multi-year protein engineering effort and is thought to approach the upper functional limit of avGFP. In conclusion, these results demonstrate that DeepDE, when coupled with limited experimental screening, presents a more pragmatic and scalable approach for AI-guided protein engineering.
Proteins (enzymes) represent foundational “chip” technologies in biotechnology, with critical applications in industry, medicine, and agriculture. However, the sequence space of a target protein is extraordinarily vast. For example, an average protein with 300 residues can yield an astronomical 3.1×1010 possible combinations from just three substitutions. This immense combinatorial complexity presents a formidable challenge in the exploration of the functional landscape of a target protein, severely impeding the practice of protein engineering. The classical approach of directed evolution, recognized by the 2018 Nobel Prize in Chemistry, is more powerful than rational design; however, it is often labor-intensive, time-consuming, and inefficient. In recent years, artificial intelligence (AI), particularly deep learning, has rapidly emerged as a promising toolkit for protein optimization. So far, the success of deep learning-guided protein engineering nonetheless remains limited, and particularly in the realm of activity. A critical hurdle is that many existing methods rely predominantly on interpolative (in-distribution) testing, without rigorous validation of extrapolative (out-of-distribution) testing, which can be misleading when addressing complex biological questions such as protein engineering. In contrast, the success of DeepDE stems from its explicit emphasis on extrapolation capability and experimental validation. Over just four rounds of evolution, DeepDE effectively explored a sequence space on the order of 10³⁵ variants while experimentally screening only 4,000 mutants in total, demonstrating exceptional extrapolative power. This capability highlights the strong potential of DeepDE for broad applications in enzyme engineering, drug development, and biotechnology. The authors also note that there is much room for further refinement and optimization for the DeepDE algorithm, and further testing is needed to assess its generalizability across proteins with diverse structures, functions, and evolutionary landscapes. To this end, the research team is actively developing DeepDE 2.0.

First author for this article was Xiaofan Li, a PhD student (Class of 2019) at the School of Biology and Biological Engineering, South China University of Technology. The study was supported by the National Key R&D Program of China and the Guangdong S&T Program.
Paper link: https://www.cell.com/iscience/fulltext/S2589-0042(25)01585-8#sec-8