Advanced Scripting

From PyMOLWiki
Jump to navigation Jump to search

On this page, we discuss more complex scripting. Python is great, but it is much slower at mathematics than C/C++/Java/FORTRAN. For that reason, you may find it more useful to export your data to another language, operate on it there and then import the results back into PyMOL. We discuss the Python API and the general operating procedure for successfully writing your own scripts.

Advanced Scripting

Python while incredibly useful, is much slower at math than some other strictly typed languages and sometimes we have libraries built in other languages. It's faster, for complicated problems to package your data, send it to C, do some math, and pass the results back to Python than to just do everything in Python. The beauty of the Python API, is that we can do just that.

This is more advanced scripting, and requires some knowledge of the Python API, and some outside language. The example shown here is in C. The C++ extensions are very similar.

Python, PyMOL and C

Here, I will show you how to write a C-module that plugs into Python and talks nicely with PyMOL. The example actually shows how to make a generic C-function and use it in Python.

First, let's assume that we want to call a function, let's call it funName. Let's assume funName will take a Python list of lists and return a list---for example passing the C++ program the XYZ coordinates of each atom, and returning a list of certain atoms with some property. I will also assume we have funName.h and funName.c for C code files. I have provided this, a more complex example, to show a real-world problem. If you were just sending an integer or float instead of packaged lists, the code is simpler; if you understand unpacking the lists then you'll certainly understand unpacking a simple scalar.


If you tell Python that you're using C++ code (see the setup below) then it'll automatically call the C++ compiler instead of the C compiler. There are warnings you may want to be aware of though.

My experience with this has been pretty easy. I simple renamed my ".c" files to ".cpp", caught the few errors (darn it, I didn't typecast a few pointers from malloc) and the code compiled fine. My experience with this is also quite limited, YMMV.

Calling the External Function

So, to start, let's look at the Python code that will call the C-function:

# -- in
# Call funName.  Pass it a list () of lists.  (sel1 and sel2 are lists.)
# Get the return value into rValFromC.
rValFromC = funName( (sel1, sel2) );

where sel1 and sel2 could be any list of atom coordinates, say, from PyMOL. (See above.)

Ok, this isn't hard. Now, we need to see what the code that receives this function call in C, looks like. Well, first we need to let C know we're integrating with Python. So, in your header file of funName.h we put:

// in funName.h
#include <Python.h>

Next, by default your C-function's name is funName_funName (and that needs to be setup, I'll show how, later). So, let's define funName:

static PyObject*
funName_funName(PyObject* self, PyObject* args)
...more code...

This is the generic call. funName is taking two pointers to PyObjects. It also returns a PyObject. This is how you get the Python data into and out of C. It shows up in "args" array of packaged Python objects and we then unpack it into C, using some helper methods. Upon completion of unpacking, we perform our C/C++ procedure with the data, package up the results using the Python API, and send the results back to Python/PyMOL.

Unpacking the Data

Let's unpack the data in args. Remember, args has a Python list of lists. So, to unpack that we do the following inside of funName:

static PyObject*
funName_funName(PyObject* self, PyObject* args)
       PyObject *listA, *listB;

       if ( ! PyArg_ParseTuple(args, "(OO)", &listA, &listB) ) {
                printf("Could not unparse objects\n");
                return NULL;

        // let Python know we made two lists
 ... more code ...

Line 4 creates the two C objects that we will unpack the lists into. They are pointers to PyObjects. Line 6 is where the magic happens. We call, PyArg_ParseTuple passing it the args we got from Python. The (OO) is Python's code for I'm expecting two Objects inside a list (). Were it three objects, then (OOO). The first object will be put into &listA and the second into &listB. The exact argument building specifications are very useful.

Reference Counting

Next, we check for success. Unpacking could fail. If it does, complain and quit. Else, listA and listB now have data in them. To avoid memory leaks we need to manually keep track of PyObjects we're tooling around with. That is, I can create PyObjects in C (being sneaky and not telling Python) and then when Python quits later on, it'll not know it was supposed to clean up after those objects (making a leak). To, we let Python know about each list with Py_INCREF(listA) and Py_INCREF(listB). This is reference counting.

Now, just for safety, let's check the lists to make sure they actually were passed something. A tricky user could have given us empty lists, looking to hose the program. So, we do:

     // handle empty selections (should probably do this in Python, it's easier)
     const int lenA = PyList_Size(listA);
     if ( lenA < 1 ) {
             printf("ERROR: First selection didn't have any atoms.  Please check your selection.\n");
             // let Python remove the lists
             return NULL;

We check the list size with, PyList_Size and if it's 0 -- we quit. But, before quitting we give control of the lists back to Python so it can clean up after itself. We do that with Py_DECREF.

More Complex Unpacking

If you're dealing with simple scalars, then you might be able to skip this portion.

Now, we should have access to the data the user sent us, in listA and listB, and it should be there and be clean. But, not forgetting that listA and listB are list of 3D coordinates, let's unpack them further into sets of coordinates. Because we know the length of the lists, we can do something like the following:

       // make space for the current coords; pcePoint is just a float[3]
       pcePoint coords = (pcePoint) malloc(sizeof(cePoint)*length);

       // loop through the arguments, pulling out the
       // XYZ coordinates.
       int i;
       for ( i = 0; i < length; i++ ) {
               PyObject* curCoord = PyList_GetItem(listA,i);
               PyObject* curVal = PyList_GetItem(curCoord,0);
               coords[i].x = PyFloat_AsDouble(curVal);

               curVal = PyList_GetItem(curCoord,1);
               coords[i].y = PyFloat_AsDouble(curVal);

               curVal = PyList_GetItem(curCoord,2);
               coords[i].z = PyFloat_AsDouble(curVal);


 ... more code ...

Where, pcePoint is just a float[3]. Line 2 just gets some memory ready for the 3xlenght list of coordinates. Then, for each item for 1..length, we unpack the list using PyList_GetItem, into curCoord. This then gets further unpacked into the float[3], coords.

We now have the data in C++/C data structures that the user passed from PyMOL. Now, perform your task in C/C++ and then return the data to PyMOL.

Sending the Results back to Python/PyMOL

Once you're done with your calculations and want to send your data back to PyMOL, you need to package it up into a Python object, using the Python API, and then return it. You should be aware of the expected return value and how you're packaging the results. If you user calls,

(results1,results2) = someCFunction(parameters1,parameters2)

then you need to package a list with two values. To build values for returning to PyMOL, use Py_BuildValue. Py_BuildValue takes a string indicating the type, and then a list of values. Building values for return has been documented very well. Consider an example: if I want to package an array of integers, the type specifier for two ints for Py_BuildValue is, "[i,i]", so my call could be:

# Package the two ints into a Python pair of ints.
PyObject* thePair = Py_BuildValue( "[i,i]", int1, in2 );

# Don't forget to tell Python about the object.

If you need to make a list of things to return, you iterate through a list and make a bunch of thePairs and add them to a Python list as follows:

# Make the python list
PyObject* theList = PyList_New(0);
# Tell Python about it

for ( int i = 0; i < someLim; i++ ) {
  PyObject* thePair = Py_BuildValue( "[i,i]", int1, in2 );

To add a list of lists, just make an outer list,

PyObject* outerList = PyList_New(0);

and iteratively add to it your inner lists:

PyObject* outerList = PyList_New(0);

for ( int i = 0; i < someLim; i++ ) {
  // make the inner list, called curList;
  curList = PyObject* curList = PyList_New(0);

  // fill the inner list, using PyList_Append with some data, shown above


Great, now we can extract data from Python, use it in C/C++, and package it back up for returning to Python. Now, we need to learn about the minimal baggage needed for C to operate with Python. Keep reading; almost done.


We need to discuss how our functions will be called from Python. First, we need to create a method table.

static PyMethodDef CEMethods[] = {
        {"ccealign", ccealign_ccealign, METH_VARARGS, "Align two proteins using the CE Algorithm."},
        {NULL, NULL, 0, NULL}     /* Always use this as the last line in your table. */

METH_VARARGS can also be METH_KEYWORDS, where the former tells C that it should expect a simple tuple or list which we will unpack with PyArg_ParseTuple, and the latter tells C that it should expect to unpack the variables by name with the PyArg_ParseTupleAndKeywords. When using METH_KEYWORDS your function needs to accept a third parameter, a Py_Object* that is the dictionary of names for unpacking. For more information check out the Python method table docs.

Each module undergoes initialization. By default the modules initialization function is: initNAME(). So, in our example above, initccealign()". During this initialization step, we need to call Py_InitModule. For or above example, we'd have,

    (void) Py_InitModule("ccealign", CEMethods);

Finally, the main function that starts the whole shebang should look something like:

main(int argc, char* argv[])

At this point, you should have a fully functioning program in C/C++ intergrated with PyMOL/Python.

Installing Your Module


The Python distutils pacakge is a great method for distributing your modules over various platforms. It handles platform specific issues as well as simplifying the overall install process. For us, those module-builders, we need to create the distuils' script, and given the above -- that's the last step.

More detailed information can be found one the Python documentation page for installing C/C++ modules. There is also information on how to build source and binary distribution packages.

For example of how powerful disutils is, I have [cealign] setup to install as simply as:

python build cealign
python install cealign

PyMOL also uses distutils for it's source-install. If more people understood distutils, I think they would install PyMOL from source since you get all the latest features.

The setup file needs to know the following (at the very least): what source files comprise the project, what include directories to scan, the project name. You can also add more metadata such as version number, author, author_email, url, etc. For this example, let's assume we have the following directory structure,

|-- build
|-- dist
|-- doc
|   `-- funName
|-- src
|   |-- etc
|   |   `-- tnt
|   |       |-- doxygen
|   |       |   `-- html
|   |       `-- html
|   `-- tnt

and we want to include all the .cpp files from the src directory, and all the include files in tnt. We start as follows,

# -- -- your module's install file

# import distutils
from distutils.core import setup, Extension
# for pasting together file lists
from glob import glob
# for handling path names in a os independent way
from os.path import join;

# grab all of the .h and .cpp files in src/ and src/tnt
srcList = [ x for x in glob(join("src", "*.cpp")) ]
# set the include directories
incDirs = [ join( "src", "tnt") ]

Ok, now Python knows which files to include. Now we need to create a new Extension. We can simply call,

# create the extension given the function name, ''funName,'' the souce list and include directories.
ccealignMods = Extension( 'funName', sources=srcList, include_dirs=incDirs  )

Lastly, all we have to do is call the final setup function, with the extension we just created and some metadata (if we want):

setup( name="funName",
        description="funName: A simple example to show users how to make C/C++ modules for PyMOL",
        author="Your Name Here",
        author_email="Your Email Goes Here",
        url="The URL of your work",

And voila -- we're done. The users should now be able to execute,

python build
# remove the brackets if you need to be root to install, see [Linux_Install#Installing_a_Script_Without_Superuser_Access Installing PyMOL w/o Superuser access] for an example.
[sudo] python install


  • discuss the pains of debugging


I hope you found this helpful and will spur you to actually write some PyMOL modules or help you overcome the speed limitations inherent in Python's math (in comparison to other strictly-typed languages).

I'm happy to hear any comments or questions you may have. Tree 09:14, 19 May 2008 (CDT)


See the source code for cealign.

See Also

stored, iterate_state, identify.