Home > Pipeline Pilot Library > Finding Duplicate Molecules

Finding Duplicate Molecules

Author: Chris Farmer

Version: 2.0

Created: 11/2002

Modified: 7/2007  Components upgraded, and help text revised/added.

Purpose: Compare two libraries and identify the molecules that are present in both.

The two files containing the molecule libraries are read into Pipeline Plot, and duplicates removed from the files individually. This protocol uses the NCI and Maybridge libraries as examples, but any set of molecules can be substituted. Next, the Merge Molecules component merges identical molecules into the same data record. Note that Merge Molecules has the "OutputFrequency" parameter set to True. This causes the component to add a "Frequency" property to the data to tell how many source records were merged to make up each output record. The Frequency property is used in the next step to find all molecules which have a Frequency > 1. These are the molecules that occurred in both the input libraries (e.g. NCI and Maybridge).

Requirements: Pipeline Pilot 6.1.1 (collections: Chemistry)

O/S: PP Server Windows and Linux
PP Client Windows

Access the protocol >

Terms of Service | www.accelrys.com | ©2006 Accelrys, Inc.