All-Atom Protein Sequence Design Based on Geometric Deep Learning.
Liu, J., Guo, Z., You, H., Zhang, C., Lai, L.(2024) Angew Chem Int Ed Engl : e202411461-e202411461
- PubMed: 39295564 
- DOI: https://doi.org/10.1002/anie.202411461
- Primary Citation of Related Structures:  
8XYR, 8XYS, 8XYT, 8XYU, 8XYV, 8XYW - PubMed Abstract: 
Designing sequences for specific protein backbones is a key step in creating new functional proteins. Here, we introduce GeoSeqBuilder, a deep learning framework that integrates protein sequence generation with side chain conformation prediction to produce the complete all-atom structures for designed sequences. GeoSeqBuilder uses spatial geometric features from protein backbones and explicitly includes three-body interactions of neighboring residues. GeoSeqBuilder achieves native residue type recovery rate of 51.6%, comparable to ProteinMPNN and other leading methods, while accurately predicting side chain conformations. We first used GeoSeqBuilder to design sequences for thioredoxin and a hallucinated three-helical bundle protein. All the 15 tested sequences expressed as soluble monomeric proteins with high thermal stability, and the 2 high-resolution crystal structures solved closely match the designed models. The generated protein sequences exhibit low similarity (minimum 23%) to the original sequences, with significantly altered hydrophobic cores. We further redesigned the hydrophobic core of glutathione peroxidase 4, and 3 of the 5 designs showed improved enzyme activity. Although further testing is needed, the high experimental success rate in our testing demonstrates that GeoSeqBuilder is a powerful tool for designing novel sequences for predefined protein structures with atomic details. GeoSeqBuilder is available at https://github.com/PKUliujl/GeoSeqBuilder.
Organizational Affiliation: 
Peking University, Center for Life Sciences, 5 Yiheyuan Road, 100871, Beijing, CHINA.