This script filters through all the PDBs in the parent dir (you can easily the the directory it scans). For each molecule, it saves just the ligands/heteroatoms (excluding the waters). This gives you a simple way to filter through a database of proteins looking only at their ligands.
This script, as noted below, works on the objects at the level of a molecule. While we can iterate over atom number (ID), residue number (resi), etc we do not have any such "MOLID". So, we provide this simple workaround. You might need this file because if you have a residue (like #111 from 3BEP) that consists of a molecule and an atom then there's no other way to save the separate pieces (of molecule/atom) into two (or more files). As you can see in the following listing, if we iterate over the hetero atoms (and not waters) in 3BEP we get,
PyMOL>iterate bymol het, print resi, resn, ID, chain, segi, alt 111 5CY 6473 C 111 5CY 6474 C 111 5CY 6476 C 111 5CY 6477 C 111 5CY 6478 C 111 5CY 6479 C 111 5CY 6480 C 111 5CY 6481 C 111 5CY 6482 C 111 5CY 6483 C 111 5CY 6484 C 111 5CY 6485 C 111 5CY 6486 C 111 5CY 6487 C 111 5CY 6488 C 111 5CY 6489 C 111 5CY 6490 C
which does not allow us to separate the two pieces.
python # # This simple script will filter through all PDBs in a directory, and for each one # save all the ligands/heterotoms (that aren't waters) to their own file. This # script operates at the level of molecules, not residues, atoms, etc. Thus, if # you have a ligand that PyMOL is treating as ONE residue, but is actually two # separate molecules, or a molecule and an atom, then you will get multiple files. # from glob import glob from os import path from pymol import stored theFiles = glob("../*.pdb"); for f in theFiles: # load the file cmd.load(f); # remove the protein and waters cmd.remove("polymer or resn HOH"); cmd.select("input", "all") cmd.select("processed", "none") mol_cnt = 0 while cmd.count_atoms("input"): # filter through the selections, updating the lists cmd.select("current","bymolecule first input") cmd.select("processed","processed or current") cmd.select("input","input and not current") # prepare the output parameters curOut = path.basename(f).split(".") + "_" + str(mol_cnt).zfill(5) + "_het.pdb" curSel = "current" # save the file cmd.save( curOut, curSel ); print "Saved " + curSel + " to " + curOut mol_cnt = mol_cnt + 1; # remove all to move to next molecule cmd.delete("*"); python end