The molecular characterization of new protein structures using eBlocks™ Gene Fragments

Wicky BIM, Milles LF, Courbet A, et al. Hallucinating symmetric protein assemblies. Science. 2022;378(6615):56-61.

eBlocks Gene Fragments were designed from an amino acid sequence for the purpose of characterizing new protein structure designs. In the protein design field, researchers are interested in discovering new and complex protein structures, which are relevant for designing and optimizing protein-based applications. In addition, researchers need to know the amino acid sequence prior to exploring the complete protein structure. Wicky et al. developed an approach called deep network hallucination to explore the space of the protein structure using design specifications and crystallizing the structures. The authors demonstrated that this approach led to the discovery of diverse new protein structure designs.

eBlocks Gene Fragments can be synthesized for a variety of protein design applications

eBlocks Gene Fragments are uniquely suited for high-throughput screening of multiple constructs for protein design research. The authors used eBlocks Gene Fragments that code for the hallucinated proteins to identify new protein structures. Researchers can clone IDT Gene Fragments into vectors using cloning strategies such as seamless cloning .  

An overview of protein design research

Protein design research focuses on editing naturally occurring peptides or designing oligomers de novo for a variety of applications, including therapeutics, industrial applications, agriculture, and sustainability. These proteins play important roles in biology from molecular binding to enzymatic catalysis and other protein functions. Protein oligomers that assemble from several identical subunits, also referred to as homo-oligomers, can be also utilized for studying protein structure due to their size and wide range of uses. Researchers are interested in studying these subunits to understand the overall structure of the protein and identify new protein structure designs. The challenge is that researchers have needed to identify the protein structure prior to design, and to have the capability to experimentally confirm that structure, which has limited protein design to what we already know, rather than explore the greater expanse of structural possibilities.

Identifying protein structures using the ProteinMPNN sequence design method, docking and computational approaches

Protein design typically starts with a hierarchical docking approach which is a set of protocols for predicting complex protein structures, starting with characterizing monomers and then higher-order structures. There are several challenges with the docking approach for predicting protein structures using in silico models and identifying not just subunit protomers, but the structure of a multi-subunit oligomer assembly. Further, the hierarchical docking approach can be considered insufficient due to restrictions of being able to understand protein-protein interactions.

ProteinMPNN is a deep learning protein sequence design method that is applicable for protein structure research. The authors used the ProteinMPNN sequence design method to generate the new protein sequences that might be better expressed in E. coli. One challenge they faced was the overfitting of the initial designs so that, when synthesized and cloned into expression systems, the sequences produced virtually no proteins with appreciable soluble expression. Ordering the protein coding DNA sequences as eBlocks Gene Fragments enabled them to rapidly screen various sequences using reliable synthetic biology techniques. Based on the high accuracy of the ProteinMPNN, researchers can use this approach to design novel sequences that can be experimentally validated.

The authors used the deep network hallucination to gain insight into identifying the subunits of the protein. They performed this method by hallucinating the space of the protein oligomeric structures, using the chain length and oligomer valency. Then, they used a computational approach to interpolate and extend native fold-space of the protein instead of relying on recapitalizing the known protein structures. By using Monte Carlo optimization, the authors were able to identify well-defined states of new structures.

Confirming and characterizing new protein structures using x-ray crystallization and cryogenic electron microscopy (cryo-EM)

There are several techniques for studying protein structures. Researchers can use x-ray crystallography to study protein structures at an atomic level. Crystallization is used for the separation and purification of proteins. The authors generated crystal structures to evaluate their design accuracy and solved 7 out of 19 designs.

Additionally, scientists performed cryo-EM, which is an imaging technique used for analysis and confirmation of protein structures. Due to the small molecular weight of their protein design structures, the authors performed high-resolution single-particle cryo-EM characterization.

Overall, Wicky et al. were able to use a combination of machine learning, synthetic biology with eBlocks Gene Fragments, and various proteomics techniques to identify new protein design structures. Deep network hallucination design of novel protein structures, confirmed experimentally, has the ability to deepen our knowledge of protein structure and expand our engineered protein arsenal to find solutions for many different applications. Learn more about how IDT Gene fragments can help you with your protein design research.

References

Disclosure: For research use only. Not for use in diagnostic procedures. Unless otherwise agreed to in writing, IDT does not intend these products to be used in clinical applications and does not warrant their fitness or suitability for any clinical diagnostic use. Purchaser is solely responsible for all decisions regarding the use of these products and any associated regulatory or legal obligations. Doc ID: RUO22-1450_001

Published Oct 10, 2022