Table of contents
© Copyright 2012-2014 by Silicos-it
Strip-it is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Strip-it is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU Lesser General Public License for more details.
Strip-it is linked against OpenBabel version 2. OpenBabel is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation version 2 of the License.
Join our Google groups community to talk about inconsistencies, errors, raise questions or to make suggestions for improvement.
Strip-it™ is a program that identifies and extracts predefined scaffolds from organic small molecules. The program is linked against the open source C++ library of OpenBabel.
The program comes with a number of predefined molecular scaffolds for extraction. These scaffolds include, amongst others:
- Molecular frameworks as originally described by Bemis and Murcko ;
- Molecular frameworks and the reduced molecular frameworks as described by Ansgar Schuffenhauer and coworkers ;
- Scaffold topologies as described by Sara Pollock and coworkers .
All of these scaffolds are explained in the following sections.
Strip-it™ is instructed by means of command line options and a ‘scaffold’ file. It is by means of the ‘scaffold’ file that the user defines one or more of the predefined scaffolds to be extracted. Figure 1 describes the actual data flow.
Command line interface
Strip-it™ is run from the command line as follows:
> strip-it [options]
One option is required:
- --input <file>
- Specifies the filename containing the input molecules. The input file should contain one or more molecules specified as a set of connection tables according specific molecular formats. The format of these connection tables is specified by the input filename extension or by the --inputFormat option. The allowed input formats are those that are supported by OpenBabel. Zipped-formats are also allowed.
Use obabel -L formats read and obabel -L formats write to get a list of all read and write formats recognized by OpenBabel.
and all other options are optional:
- --output <file>
- Specifies the name of the file to which the generated scaffolds are written. The output file contains the generated scaffolds in a specific format. The first line is a header that contains the names of the generated scaffolds. These names are identical to the keywords that define the desired scaffolds (see the scaffold definitions section below). The subsequent lines in the output file are composed of the generated scaffolds, with the data for each input molecule on a separate line. Scaffolds are represented in a SMILES notation. If the --output option is not provided, then all output is written to standard output. In the current version of Strip-it™, stereochemistry, either implicitly or explicitly defined in the molecule, is removed before the actual scaffold generation takes place. If a molecule cannot be converted to a scaffold, for example in the case when the molecule contains no rings, then a ‘-‘ character is written to the output file.
- --inputFormat <format>
- This option specifies the format of the input file. The format specification should be one of the formats as recognised by OpenBabel. If this option is not specified, then the respective format is extracted from corresponding file extension. This is the default behavior.
- --scaffolds <file>
- This option specifies the file in which the keywords are contained that describe the scaffolds that need to be generated. A detailed overview of the specific scaffold keywords that are recognised is provided in the scaffold definitions section. If the --scaffolds option is not provided, then by default all scaffold types are calculated.
- Specifying this optional command line option suppresses the generation of additional information during the calculation process.
- Specifying this optional command line option suppresses the header line from being printed in the output.
The scaffold definition file (--scaffold) contains the names of all the specific scaffolds that should be extracted from each input molecule. Scaffolds are defined by means of keywords. The program recognizes only the keywords that are defined in this section.
Each line should contain only a single scaffold keyword. Blanc lines, or lines starting with ‘#’ or ‘//’ are ignored. These latter can be used as comment lines.
Keywords may be multiple repeated in a single scaffold definition file. In this case the specific scaffold is generated and written out more than once. Scaffold keyword definitions are case-insensitive. Hence, the Scaffold_1 keyword is interpreted as being identical to SCAFFOLD_1 or scaffold_1. However, these keywords are not the same as scaffold1.
Each scaffold keyword is composed of a type specification and a number. The type specification describes the type of scaffold, and the number defines the specific flavor of the specific scaffold type. Type and number are concatenated by a underscore (‘_‘) character.
The murcko scaffold type has been originally described by Bemis and Murcko .
An example of the MURCKO_1 scaffold as implemented in the current version of Strip-it™ is given in Figure 4.
The MURCKO_2 scaffold differs from the MURCKO_1 scaffold by the way linkers are represented. In the MURCKO_2 scaffold these linkers are all condensed to a single chain as shown in Figure 5.
The oprea scaffold type has been originally described by Pollock and coworkers . In the current version of Strip-it™, a number of flavours of this scaffold type have been implemented.
In all cases, the minimal SMILES representation of the different scaffolds is returned.
Examples of the OPREA_1 scaffold type are given in Figure 6. From all the implemented oprea scaffolds in the current version of Strip-it™, this type is closest to the original description of the publication.
The OPREA_2 scaffold type differs from the OPREA_1 scaffold by the fact that all the existing hydrogen bonding acceptor and donor information of the ring atoms in the original molecule is kept intact and transferred into the resulting scaffold. Hydrogen bond acceptor atoms in the original molecule are represented by oxygen atoms in the final scaffold, while hydrogen bond donor atoms are represented by nitrogen atoms in the scaffold. Amide or sulfoxide ring oxygens are treated as being part of the ring and set to hydrogen bond acceptors. All other atoms are replaced by saturated carbon atoms and all bond orders are set to one. Neighbouring ring atoms which are of the same type (hydrogen bond acceptor, hydrogen bond donor and carbon) are merged together into a single atom of the corresponding type. Linkers are replaced by a single bond (Figure 7).
The OPREA_3 scaffold type differs from the OPREA_2 scaffold by the fact that the hydrogen bonding acceptor and donor information of both the linker and ring atoms in the original molecule are kept and transferred into the resulting scaffold. Flanking atoms within rings and linkers that have the same type (hydrogen bond acceptor, hydrogen bond donor and carbon) are merged together into a single atom of the corresponding type (Figure 8).
The schuffenhauer scaffold type has been originally described by Ansgar Schuffenhauer and coworkers .
SCHUFFENHAUER_1 scaffolds are generated by removing in an iterative fashion all rings of the molecule until a single ring remains, unless it is not possible to tag the remaining rings in an unambigious way. In such cases where two or more choices are possible, the generated SCHUFFENHAUER_1 scaffold will consist of two or more rings. A couple of examples of the SCHUFFENHAUER_1 scaffold as implemented in the current version of Strip-it™ is given in Figure 9.
When additional logging is requested when running Strip-it™ (program option --noLog not set), the program will output each rule that is used in stripping each appropriate cycle. The rule numbering corresponds to the steps as described in the original Schuffenhauer publication .
SCHUFFENHAUER_2 scaffolds are generated in the same way as SCHUFFENHAUER_1 scaffolds, but the procedure stops from the moment that two rings remain, unless it is not possible to tag the remaining rings in an unambigious way. In such cases, the generated SCHUFFENHAUER_2 scaffold will consist of three or more rings.
SCHUFFENHAUER_3 scaffolds are generated in the same way as SCHUFFENHAUER_1 scaffolds, but the procedure stops from the moment that three rings remain, unless it is not possible to tag the remaining rings in an unambigious way. In such cases, the generated SCHUFFENHAUER_3 scaffold will consist of four or more rings.
SCHUFFENHAUER_4 scaffolds are generated in the same way as SCHUFFENHAUER_1 scaffolds, but the procedure stops from the moment that four rings remain, unless it is not possible to tag the remaining rings in an unambigious way. In such cases, the generated SCHUFFENHAUER_4 scaffold will consist of five or more rings.
SCHUFFENHAUER_5 scaffolds are generated in the same way as SCHUFFENHAUER_1 scaffolds, but the procedure stops from the moment that five rings remain, unless it is not possible to tag the remaining rings in an unambigious way. In such cases, the generated SCHUFFENHAUER_5 scaffold will consist of six or more rings.
Timings and number of scaffolds
The time needed for the calculation of the scaffolds depends strongly on the type of scaffold. In the following table, a summary is given on the relative timings for each of the implemented scaffold types (expressed as compounds per second). These timings have been generated by performing the analysis on 100,000 randomly selected drug-like molecules. In addition to the calculation time for each scaffold, the table also gives the number of unique scaffolds that are generated from the 100,000 molecules in the input file:
Scaffold Relative time Number of scaffolds -------------------- --------------- ------------------- RINGS_WITH_LINKERS_1 108 49,128 RINGS_WITH_LINKERS_2 82 52,620 MURCKO_1 106 16,124 MURCKO_2 108 5,349 OPREA_1 105 475 OPREA_2 94 7,328 OPREA_3 93 16,243 SCHUFFENHAUER_1 7 462 SCHUFFENHAUER_2 9 7,595 SCHUFFENHAUER_3 10 33,267 SCHUFFENHAUER_4 13 47,786 SCHUFFENHAUER_5 16 50,827
Installation of the Strip-it™ program relies on the libraries of OpenBabel version 2.3. Installation of OpenBabel is exemplified in the Configuring OS X for chemoinformatics section of this website.
The installation of Strip-it™ assumes that the BABEL_DATADIR, BABEL_LIBDIR, and BABEL_INCLUDEDIR point to the directories where OpenBabel has been installed:
> echo $BABEL_INCLUDEDIR /usr/local/openbabel/include/openbabel-2.0/ > echo $BABEL_LIBDIR /usr/local/lib/openbabel/2.3.1/ > echo $BABEL_DATADIR /usr/local/openbabel/share/openbabel/2.3.1/
Start by downloading Strip-it™ from our software section and un-tar this file into the /usr/local/src directory:
> cd /usr/local/src > sudo tar -xvf ~/Downloads/strip-it-1.0.2.tar.gz
Change into this directory and start the building process:
> cd strip-it-1.0.2 > sudo mkdir build > cd build > sudo cmake .. > sudo make > sudo make install
This latter command will install the Strip-it™ executable in the /usr/local/bin/ directory. Finally, check the installation by entering:
> make test
This should complete all tests without errors.
|||(1, 2) Bemis, G.W.; Murcko, M.A. (1996) ‘The properties of known drugs. 1. Molecular frameworks’, J. Med. Chem. 39, 2887-2893 [pubmed/8709122]|
|||(1, 2, 3) Schuffenhauer, A.; Ertl, P.; Roggo, S.; Wetzel, S.; Koch, M.A. & Waldmann, H. (2007) ‘The scaffold tree - visualization of the scaffold universe by hierarchical scaffold classification’, J. Chem. Inf. Model. 47, 47-58 [pdf]|
|||(1, 2) Pollock, S.N.; Coutsias, E.A.; Wester, M.J. & Oprea, T.I. (2008) ‘Scaffold topologies. 1. Exhaustive enumeration up to eight rings’, J. Chem. Inf. Model. 48, 1304-1310 [pubmed/18605680]|
Added the --noHeader option (based on a patch provided by Björn Grüning from the University of Freiburg).
Added the --inputFormat option (based on a patch provided by Björn Grüning from the University of Freiburg).
This is the first official release of Strip-it™. The program is a successor of the program Stripper from Silicos, and is branched out of version 1.0.4 of Stripper.
Additions to the original Stripper version include:
- Porting the documentation to html and including some improvements.