View source for FilterByMol

== Overview ==
This script filters through all the PDBs in the parent dir (you can easily the the directory it scans).  For each molecule, it saves '''just''' the ligands/heteroatoms (excluding the waters).  This gives you a simple way to filter through a database of proteins looking only at their ligands.

This script, as noted below, works on the objects at the level of a '''molecule'''.  While we can [[iterate]] over atom number (ID), residue number (resi), etc we do not have any such "MOLID".  So, we provide this simple workaround.  You might need this file because if you have a residue (like #111 from 3BEP) that consists of a molecule and an atom then there's no other way to save the separate pieces (of molecule/atom) into two (or more files).  As you can see in the following listing, if we iterate over the hetero atoms (and not waters) in 3BEP we get,
<source lang="python">
PyMOL>iterate bymol het, print resi, resn, ID, chain, segi, alt
111 5CY 6473 C  
111 5CY 6474 C  
111 5CY 6476 C  
111 5CY 6477 C  
111 5CY 6478 C  
111 5CY 6479 C  
111 5CY 6480 C  
111 5CY 6481 C  
111 5CY 6482 C  
111 5CY 6483 C  
111 5CY 6484 C  
111 5CY 6485 C  
111 5CY 6486 C  
111 5CY 6487 C  
111 5CY 6488 C  
111 5CY 6489 C  
111 5CY 6490 C  
</source>
which does not allow us to separate the two pieces.

== The Code ==
<source lang="python">
python

#
# This simple script will filter through all PDBs in a directory, and for each one
# save all the ligands/heterotoms (that aren't waters) to their own file.  This
# script operates at the level of molecules, not residues, atoms, etc.  Thus, if
# you have a ligand that PyMOL is treating as ONE residue, but is actually two
# separate molecules, or a molecule and an atom, then you will get multiple files.
#

from glob import glob
from os import path
from pymol import stored

theFiles = glob("../*.pdb");

for f in theFiles:
    # load the file
    cmd.load(f);
    # remove the protein and waters
    cmd.remove("polymer or resn HOH");

    cmd.select("input", "all")
    cmd.select("processed", "none")
    mol_cnt = 0

    while cmd.count_atoms("input"):
        # filter through the selections, updating the lists
        cmd.select("current","bymolecule first input")
        cmd.select("processed","processed or current")
        cmd.select("input","input and not current")

        # prepare the output parameters
        curOut = path.basename(f).split(".")[0] + "_" + str(mol_cnt).zfill(5) + "_het.pdb"
        curSel = "current"
        
        # save the file
        cmd.save( curOut, curSel );
        print "Saved " + curSel + " to " + curOut
        
        mol_cnt = mol_cnt + 1;

    # remove all to move to next molecule
    cmd.delete("*");        

python end
</source>

[[Category:Script_Library]]
[[Category:System_Scripts]]