HomeresearchPeopleGeneral InfoSeminarsResources
| Alg & App Group| Home | Research | Publications | People | Resources | News

Algorithms & Applications Group
Using Motion Planning to Study Protein Folding and Motion


supported by NSF
Xinyu Tang, Lydia Tapia, Shawna Thomas, Nancy Amato,
Ken Dill (UCSF), Lawrence Rauchwerger, Marty Scholtz (Medical Biochemistry & Genetics)
Project Alumni: Luke Hunter, Bonnie Kirkpatrick, Kasia Leyk,
Guang Song, Annette Stowasser


Visit our Protein Folding Server!
Our Protein Folding Server is available to analyze your proteins with our motion planning technique.


Despite the explosion in protein structural and functional data, our understanding of protein folding and movement is still very limited. Experimental methods cannot operate at the time scales necessary to record protein folding and motions, and traditional simulation techniques such as molecular dynamics and Monte Carlo methods are too computationally expensive to simulate long enough time periods for anything other than small peptide fragments. However, it is critical that we better understand protein motion and the folding process for several reasons: understanding the folding process can give insight into how to develop better structure prediction algorithms, treatments for diseases such as Alzheimer's and Mad Cow disease can be found by studying protein misfolding, and many biochemical processes are regulated by protein motion.

Our technique for computing protein folding pathways and protein motions is based on the successful probabilistic roadmap (PRM) method for robotics motion planning. We were inspired to apply PRMs to protein folding based on our success in applying it to paper folding. The figure below demonstrates the parallelism between paper folding and protein folding.

Folding snapshots of a periscope (movie) and staphylococcal Protein A, B domain,
computed by our technique. Also available is a folding box movie.

We have obtained promising results for several small proteins (about 60-100 amino acids), and we have validated our pathways by comparing their secondary structure formation order with known experimental results, examining folding rates, and studying population kinetics. A major feature of our technique is that it produces many folding pathways in just a few hours on a desktop PC. These pathways provide global information about the protein's energy landscape.

Secondary Structure Formation Order Analysis

In order to study larger proteins and more challenging protein motion problems, we applied rigidity analysis to bias our sampling in a more physically realistic way. Given a protein structure, we use rigidity analysis to identify the rigid and flexible components of the protein. We sample the flexible regions with a high probability and sample the rigid regions with a low probability. We then performed a detailed study of protein G (streptococcal protein G, B1 immunoglobulin-binding domain), protein L (peptostreptococcal protein L, B1 immunoglobulin-binding domain), and two mutated forms of protein G, NuG1 and NuG2, to further validate our technique. These proteins provide a good test case for our technique because they are known to fold differently even though they have similar structures:

Protein G Protein L NuG1 NuG2
Native 3D structures of proteins G, L, NuG1, and NuG2. Mutated residues in NuG1 and NuG2 are indicated in wireframe.

All four proteins are composed of an alpha-helix (green) and two beta-hairpin turns, hairpin 1 (orange) and hairpin 2 (blue). Native state out-exchange experiments, pulse labeling/competition experiments, and phi-value analysis indicate that hairpin 1 forms first in protein L and hairpin 2 forms first in protein G. Protein G was then mutated by substituting a few residues in hairpin 2 to decrease its relative stability. Phi-value analysis indicates that the mutations switch the hairpin formation order from the wildtype to that of protein L. We were able to successfully replicate these results with our technique. For each protein, we analyzed hundreds of folding pathways. The following table shows our simulation results:

Comparison of secondary structure formation orders for proteins G, L, NuG1, and NuG2 with experimental results.

In addition to detecting the correct folding behavior of each protein, our rigidity-based technique can also help explain the stability shift in NuG1 and NuG2 from the wildtype. Consider their native state rigidity maps:

Protein G Protein L NuG1 NuG2
Rigidity maps of the native structure for proteins G, L, NuG1, and NuG2.

A rigidity map is similar to a contact map except residues pairs are marked according to their rigidity relationship: black if they are in the same rigid component, green if they are in the same dependently flexible set, and white otherwise. In all four proteins, the central alpha-helix remains completely rigid, as expected. We also see increased rigidity from protein G to NuG1 and NuG2 in hairpin 1 as suggested by experiment. For the other regions, the rigidity maps show more similarity between the mutated forms of protein G and protein L than with protein G, reflecting the similarity/difference in folding behaviors.

Population Kinetics/Folding Rate Analysis

Our roadmaps provide an approximate view of a protein's folding landscape. In our recent work, we have explored various methodologies for studying the kinetics of protein folding in these approximate models. Our two new methodologies, map-based Master Equation (MME) and map-based Monte Carlo (MMC), allow us to study two important features of protein folding kinetics: relative folding rates and population kinetics. While master equation calculation and Monte Carlo simulation have been performed on protein models before, their computational cost have prohibited them from being used for large protein models. An important benefit of our map-based approximation approach is that it enables us to study the kinetics of much larger proteins than can be handled by traditional master equation solution or Monte Carlo simulation.

We can use the MME approach to study relative folding rates as extracted from our roadmaps. For this study, we looked at Protein G and its variants NuG1 and NuG2. Experimental studies have shown that these two mutants folds 100 times faster than protein G. We applied the MME approach to study the relative folding rates of these three proteins.

In the figure above, the simulated folding rates are shown for Protein G and its mutants, NuG1 and NuG2. The plotted eigenvalues were calculated using MME. Note that the smallest non-zero eigenvalues correspond to the folding rate (index 2). Our simulations show that Protein G folds much slower than its mutants NuG1 and NuG2. This what has been seen in lab experiments.

Also shown above is the performance of MME for roadmaps ranging in size from 2000 to 15000 nodes. The running time of MME scales linearly with roadmap size (i.e. the size of the landscape model).

  Population Kinetics Tryptophan Contact Formation Helix Formation
Protein ACBP

Displayed in the figure above is the 86 residue all alpha protein, Acyl-coenzyme A Binding Protein (ACBP), we have studied through the MMC technique. This protein has five helices and two tryptophans in the core of the protein. The folding of ACBP has been studied in the lab through tryptophan fluorescence, and it has been shown that it has a fast, two-state folder. From our MMC kinetics, we see that ACBP exhibits a property similar to other all alpha proteins we have studied: continual formation of helix contacts and reaching the native state after the formation of many helix contacts. Also, since ACBP has two tryptophans in the core of the protein, we see a quick increase in the formation of these contacts around the same time we see the native state beginning to be populated, around time step 100. This could correspond to the packing of the structure and the formation of long-range interactions in the core of the protein. We also show this protein has similar kinetics to other all alpha-proteins and other lab experiments on alpha proteins. Through the MMC technique we have been able to study population kinetics for proteins of various lengths and mixed structures.

Exchange Rate Analysis and Folding Core Identification

We have also used our roadmaps, or approximate landscape models, to simulate relative hydrogen exchange rates and infer the folding core. The folding core is the set of residues that are the first to form structure during folding and the last to lose structure during denaturation. One of the most prevalent experimental methods to infer the folding core is hydrogen exchange. It measures the rate at which residues exchange their hydrogens for Deuterium during either folding or unfolding.

We use rigidity analysis to examine the folding pathways in our roadmaps (typically extracted by MMC) and compute an approximate exchange rate for each residue. We developed two approaches for approximating exchange: one looks at the individual residue's rigidity/flexibility and one looks at the formation of mutually rigid, non-local subsequences. We found good correlation between the residues identified experimentally as part of the folding core and residues we identify with slow exchange rates:

Continuous Labeling EX [Hilton78] Continuous Labeling EX [Richarz79] Pulse Labeling EX [Roder86] Simulated Relative EX (by rigidity/flexibility) Simulated Relative EX (by mutually rigid subsequences)
Visual comparison of simulated exchange rates to experimental data for bovine pancreatic trypsin inhibitor. For experimental data, residues identified in the folding core are red. Grey residues were either not identified or not measured. For simulated data, residues are shaded by exchange rate from fastest/blue to slowest/red.


For additional analysis results or to submit your
own proteins for study, please visit our protein folding server.



We've been featured in the news!

Related Projects

Applications of Motion Planning to Computational Biology
Ligand Binding
RNA Folding


Papers

A Motion Planning Approach to Studying Molecular Motions, Lydia Tapia, Shawna Thomas, Nancy M. Amato, Technical Report, TR08-006, Parasol Laboratory, Department of Computer Science, Texas A&M University, Nov 2008.
Technical Report(abstract)

Protein Folding Core Identification from Rigidity Analysis and Motion Planning, Shawna Thomas, Lydia Tapia, Nancy M. Amato, Technical Report, TR08-001, Parasol Laboratory, Department of Computer Science, Texas A&M University, Oct 2008.
Technical Report(ps, pdf, abstract)

Using Dimensionality Reduction to Better Capture RNA and Protein Folding Motions, Lydia Tapia, Shawna Thomas, Nancy M. Amato, Technical Report, TR08-005, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, Texas, U.S.A., Oct 2008.
Technical Report(ps, pdf, abstract)

Techniques for Modeling and Analyzing RNA and Protein Folding Energy Landscapes, Xinyu Tang, Ph.D. Thesis, Department of Computer Science, Texas A&M University, Dec 2007.
Ph.D. Thesis(ps, pdf, abstract)

Elucidating the Protein Folding Core, Shawna Thomas, Lydia Tapia, Nancy M. Amato, Technical Report, TR07-006, Parasol Laboratory, Department of Computer Science, Texas A&M University, Sep 2007.
Technical Report(ps, pdf, abstract)

Kinetics Analysis Methods For Approximate Folding Landscapes, Lydia Tapia, Xinyu Tang, Shawna Thomas, Nancy M. Amato, In Int. Conf. on Int. Sys. for Mol. Bio. (ISMB)/European Conf. on Comp. Bio.(ECCB), Vienna, Austria, Jul 2007. Also, Bioinformatics, 23(13):i539-i548, Jul 2007. Also, Technical Report, TR07-002, Parasol Laboratory, Department of Computer Science, Texas A&M University, Feb 2007.
Journal(pdf, abstract) Technical Report(ps, pdf, abstract)

Simulating Protein Motions with Rigidity Analysis, Shawna Thomas, Xinyu Tang, Lydia Tapia, Nancy M. Amato, Journal of Computational Biology, 14(6):839-855, Jul 2007. Also, In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pp. 394-409, Apr 2006. Also, Technical Report, TR05-008, Parasol Laboratory, Department of Computer Science, Texas A&M University, Sep 2005.
Journal(ps, pdf, abstract) Proceedings(ps, pdf, abstract) Technical Report(ps, pdf, abstract)

Roadmap-Based Methods for Studying Protein Folding Kinetics, Lydia Tapia, Xinyu Tang, Shawna Thomas, Nancy M. Amato, Technical Report, TR06-011, Parasol Laboratory, Department of Computer Science, Texas A&M University, Oct 2006.
Technical Report(ps, pdf, abstract)

Parallel Protein Folding with STAPL, Shawna Thomas, Gabriel Tanase, Lucia K. Dale, Jose M. Moreira, Lawrence Rauchwerger, Nancy M. Amato, Concurrency and Computation: Practice and Experience, 17(14):1643-1656, Dec 2005.
Journal(ps, pdf, abstract)

Protein Folding by Motion Planning, Shawna Thomas, Guang Song, Nancy M. Amato, Physical Biology, 2:S148-S155, Nov 2005.
Journal(ps, pdf, abstract)

Parallel Protein Folding with STAPL, Shawna Thomas, Nancy M. Amato, In Proc. IEEE Int. Wkshp. on High Performance Computational Biology, Santa Fe, NM, Apr 2004.
Proceedings(ps, pdf, abstract)

A Motion Planning Approach to Folding: From Paper Craft to Protein Folding, Guang Song, Nancy M. Amato, IEEE Transactions on Robotics and Automation, 20(1):60-71, Feb 2004. Also, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 948-953, Seoul, Korea, May 2001. Also, Technical Report, TR00-017, Department of Computer Science, Texas A&M University, Jul 2000.
Journal(ps, pdf, abstract) Proceedings(ps, pdf, abstract) Technical Report(ps, pdf, abstract)

A Motion Planning Approach to Protein Folding, Guang Song, Ph.D. Thesis, Parasol Laboratory, Department of Computer Science, Texas A&M University, Dec 2003.
Ph.D. Thesis(ps, abstract)

A Path Planning-based Study of Protein Folding With a Case Study of Hairpin Formation in Protein G and L, Guang Song, Shawna Thomas, Ken A. Dill, J. Martin Scholtz, Nancy M. Amato, In Proc. Pac. Symp. of Biocomputing (PSB), pp. 240-251, Lihue, HI, Jan 2003.
Proceedings(ps, pdf, abstract)

Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures, Nancy M. Amato, Ken Dill, Guang Song, Journal of Computational Biology, 10(2):239-255, Nov 2002. Also, In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pp. 2-11, Apr 2002.
Journal(ps, pdf, abstract) Proceedings(pdf, abstract)

Using Motion Planning to Study Protein Folding Pathways, Guang Song, Nancy M. Amato, Journal of Computational Biology, 9(2):149-168, Nov 2001. Also, In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pp. 287-296, Apr 2001. Also, Technical Report, TR00-026, Parasol Laboratory, Department of Computer Science, Texas A&M University, Oct 2000.
Journal(ps, pdf, abstract) Proceedings(ps, pdf, abstract) Technical Report(ps, pdf, abstract)

Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures, Nancy M. Amato, Guang Song, Technical Report, TR01-001, Parasol Laboratory, Department of Computer Science, Texas A&M University, Oct 2001.
Technical Report(ps, pdf, abstract)

A Motion Planning Approach to Folding: From Paper Craft to Protein Structure Prediction, Guang Song, Nancy M. Amato, Technical Report, TR00-001, Department of Computer Science, Texas A&M University, Jan 2000.
Technical Report(ps)



Parasol Home | Research | People | General info | Seminars | Resources  

Parasol Lab, 301 Harvey R. Bright Bldg, 3112 TAMU, College Station, TX 77843-3112 
Contact Webmaster      Phone 979.458.0722     Fax 979.458.0718 
Dwight Look College of Engineering
Department of Computer Science and Engineering | Dwight Look College of Engineering | Texas A&M University
    
Privacy statement: Computer Science and Engineering Engineering TAMU