Finding Duplicate Molecules
Author: Chris Farmer
Version: 2.0
Created: 11/2002
Modified: 7/2007 Components upgraded, and help text revised/added.
Purpose: Compare two libraries and identify the molecules that are present in both.
The two files containing the molecule libraries are read into Pipeline Plot, and duplicates removed from the files individually. This protocol uses the NCI and Maybridge libraries as examples, but any set of molecules can be substituted. Next, the Merge Molecules component merges identical molecules into the same data record. Note that Merge Molecules has the "OutputFrequency" parameter set to True. This causes the component to add a "Frequency" property to the data to tell how many source records were merged to make up each output record. The Frequency property is used in the next step to find all molecules which have a Frequency > 1. These are the molecules that occurred in both the input libraries (e.g. NCI and Maybridge).
Requirements: Pipeline Pilot 6.1.1 (collections: Chemistry)
O/S: PP Server Windows and Linux
PP Client Windows
Access
the protocol >
|