From PyMOLWiki
Jump to: navigation, search
Type Python Script
Author(s) Troels E. Linnet
License BSD
This code has been put under version control in the project Pymol-script-repo


This script is an experimental surface cysteine pKa predictor.
The script is solely based on the work by:

Predicting Reactivities of Protein Surface Cysteines as Part of a Strategy for Selective Multiple Labeling.
Maik H. Jacob, Dan Amir, Vladimir Ratner, Eugene Gussakowsky, and Elisha Haas.
Biochemistry. Vol 44, p. 13664-13672, doi:10.1021/bi051205t

Questions to the article should be send to Maik Jacob.



The authors Jacob et al. were able to describe a computational algorithm that could predict the reactivity of surface cysteines. The algorithm was based on reaction rates with Ellmans reagent, Riddles et al., on 26 single cysteine mutants of adenylate kinase. The authors could predict the reactivity of the cysteines with a pearson correlation coefficient of 0.92. The algorithm was based on predicting the pKa values of cysteines by a calculation of electrostatic interactions to the backbone and sidechains of the protein and a energetic solvation effect from the number of atom neighbours. The algorithm is different from other pKa algorithms, since it calculates a Boltzmann energy distribution for the rotational states of cysteine. The reaction rate with Ellman's reagent was set proportional to the fraction of negatively charged cysteines, Bulaj et al.

The authors Ratner et al., used the prediction of cysteine reactivity to selectively label several two cysteine mutants of adenylate kinase. A double mutant was selected with a high and a low reactive cysteine. A dye was first reacted with the most reactive cysteine and purified. Subsequent the second dye was attached to the low reactive cysteine under unfolding conditions. The strategy is interesting since in meet the double challenge of both site-specificity and double labeling of proteins.

The prediction algorithm was tested against a popular, (see Sanchez et al., Sundaramoorthy et al.), web-based pKa prediction program PROPKA. A script was developed to to send a structure from PyMOL to the server and fetch and display the calculated pKa values in PyMOL. The adenylate kinase protein was virtual mutated at the 26 positions described by Jacob et al. and showed a lower pearson correlation coefficient of 0.7. In collaboration with Dr. Maik Jacob, the computational algorithm was developed as a general cysteine prediction algorithm for PyMOL. The script is here published at the pymolwiki, but the phase of validating the algorithm against other experimental pKa values has not been finished. The validation phase was hindered by the limited amount of available cysteine pKa data. For example revealed a search in the "PPD a database of protein ionization constants" only 12 canditates, where several was dimerized and only 2 candidate cysteines were surface exposed.

Algorithm development

The algorithm is based on electrostatic calculations, where some parameters have been fine-tuned.
The distance from the sulphur atom (SG) of the cysteine to the nearest backbone amide groups and residues with a partial charge, is considered in the electrostatic model.
The model is including a evalution of Boltzman distribution of the rotation of the SG atom around the CA->CB bond.

Twenty-six mutants of Escherichia coli adenylate kinase (4AKE) were produced, each containing a single cysteine at the protein surface, and the rates of the reaction with Ellman's reagent were measured. The reaction rate was set proportional to the pKa, to fine-tune the parameters in the electro static model.

Correction to article

There is a type error in equation 6. There is missing a minus "-". The equations should read:

W_{MC,SC(i)} = - \left( \sum W_{MC(i)} + \sum W_{SC(i)} \right)

Example of use

Escherichia coli adenylate kinase.

import cyspka

fetch 4AKE, async=0
create 4AKE-A, /4AKE//A and not resn HOH
delete 4AKE
hide everything
show cartoon, 4AKE-A
cyspka 4AKE-A, A, 18

### You can loop over several residues. 
loopcyspka 4AKE-A, A, residue=18.25.41-42

### OR for the original 26 residues. Takes a long time, so not to many at the time.
#loopcyspka 4AKE-A, A, residue=18.25.41-


  • Ellman's Reagent: 5,5'-Dithiobis(2-nitrobenzoic Acid) a Reexamination. Peter W. Riddles, Robert L. Blakeley, and Burt Zerner. Analytical Biochemistry. Vol 94, p. 75-81, 1979
  • Ionization Reactivity Relationships for Cysteine Thiols in Polypeptides. Grzegorz Bulaj, Tanja Kortemme, and David P. Goldenberg. Biochemistry. Vol 37, p. 8965-8972, 1998
  • A General Strategy for Site-Specific Double Labelling of Globular Proteins for Kinetic FRET Studies. V. Ratner, E. Kahana, M. Eichler, and E. Haas. Bioconjugate Chem.. Vol 13, p. 1163-1170, 2002
  • Prediction of reversibly oxidized protein cysteine thiols using protein structure properties. Ricardo Sanchez, Megan Riddle, Jongwook Woo, and Jamil Momand. Protein Science. Vol 17, p. 473-481, 2008
  • Predicting protein homocysteinylation targets based on dihedral strain energy and pKa of cysteines. Elayanambi Sundaramoorthy, Souvik Maiti, Samir K. Brahmachari, and Shantanu Sengupta. Proteins. December, p. 1475-1483, 2007 doi:10.1002/prot.21846