Ads by Google
Christian Borgelt's Web Pages

MoSS - Molecular Substructure Miner

(aka MoFa - Molecular Fragment Miner)


moss.jar (179 kb) executable Java archive (1123 kb) Java sources, version 3.0 (2022.10.31)/8.3 (2022.11.19)
moss.tar.gz (1006 kb) Java sources, version (GUI/Miner)
example1.smiles (1 kb) example input file in SMILES format
example2.smiles (1 kb) example input file in SMILES format
steroids.smiles (1 kb) example input file in SMILES format

The source package also contains some basic documentation in HTML format, in the directory moss/doc/user and javadoc documentation in the directory moss/doc/java.

Attention: Since version 6.15 (Miner) / 2.13 (GUI) preprocessing molecular databases by converting Kekulé representations to aromatic rings is no longer the default (that is, with these and newer program versions option -K is inverted and "Convert Kekulé representations" on the "Rings & Chains" tab of the GUI is deactivated by default).


MoSS (Molecular Subsstructure miner) is a program to find frequent molecular substructures and discriminative fragments in a database of molecule descriptions. The algorithm is inspired by the Eclat algorithm for frequent item set mining. Apart from the default MoSS/MoFa algorithm, this program contains the gSpan algorithm [Yan and Han 2002] (or rather its extension CloseGraph [Yan and Han 2003]) as a special processing mode.

Call the program without any arguments to get a list of options. See the shell script run (included in the source package) for examples of how to invoke the program. The example input files made available above (also contained in the data directory in the source package) show one input format.

More example input files are contained in the source archive, in the directory moss/data.

Full description of this program (included in the source package).

The first version of this program was developed in cooperation with Tripos, Inc., Data Analysis Research Lab, South San Francisco, CA, USA. Furthermore, some extensions have been developed together with the Nycomed Chair for Bioinformatics and Information Mining (Michael R. Berthold) of the University of Konstanz.

Details about the application and the algorithm can be found in these papers:

Note that this program version does not support wildcard atoms and does not have a graphical user interface as the version described in two of the above papers. The version supporting these features is property of Tripos, Inc.