MoSS - Molecular Substructure Miner
(aka MoFa - Molecular Fragment Miner)
Download
The source package also contains some basic documentation in HTML
format, in the directory moss/doc/user
and javadoc
documentation in the directory moss/doc/java
.
Attention: Since version 6.15 (Miner) / 2.13 (GUI)
preprocessing molecular databases by converting Kekulé
representations to aromatic rings is no longer the default
(that is, with these and newer program versions option -K
is inverted and "Convert Kekulé representations" on the
"Rings & Chains" tab of the GUI is deactivated by default).
Description
MoSS (Molecular Subsstructure miner) is a
program to find frequent molecular substructures and discriminative
fragments in a database of molecule descriptions. The algorithm is
inspired by the Eclat algorithm for frequent item set mining. Apart
from the default MoSS/MoFa algorithm, this program contains the gSpan
algorithm [Yan and Han 2002] (or rather its extension CloseGraph
[Yan and Han 2003]) as a special processing mode.
Call the program without any arguments to get a list of options.
See the shell script run
(included in the source package)
for examples of how to invoke the program. The example input files
made available above (also contained in the data
directory
in the source package) show one input format.
More example input files are contained in the source archive,
in the directory moss/data
.
Full description
of this program (included in the source package).
The first version of this program was developed in cooperation
with Tripos, Inc.,
Data Analysis Research Lab, South San Francisco, CA, USA.
Furthermore, some extensions have been developed together with
the
Nycomed Chair for Bioinformatics and Information Mining
(Michael R. Berthold) of the
University of Konstanz.
Details about the application and the algorithm can be found
in these papers:
- Support Computation
for Mining Frequent Subgraphs in a Single Graph
Mathias Fiedler and Christian Borgelt
Proc. 5th Int. Workshop on Mining and Learning with Graphs
(MLG 2007, Florence, Italy).
(to appear)
mlg_07.pdf (218 kb)
mlg_07.ps.gz (82 kb)
(6 pages)
- Full Perfect Extension Pruning for Frequent Graph Mining
Christian Borgelt and Thorsten Meinl
Proc. Workshop on Mining Complex Data
(MCD 2006 at ICDM 2006, Hong Kong, China).
IEEE Press, Piscataway, NJ, USA 2006
mcd_06.pdf (240 kb)
mcd_06.ps.gz (180 kb)
(6 pages)
- Combining Ring Extensions and Canonical Form Pruning
Christian Borgelt
Proc. 4th Int. Workshop on Mining and Learning with Graphs
(MLG 2006, Berlin, Germany).
ECML/PKDD Organization Committee 2006
mlg_06.pdf (266 kb)
mlg_06.ps.gz (131 kb)
(8 pages)
- On Canonical Forms for Frequent Graph Mining
Christian Borgelt
Workshop on Mining Graphs, Trees, and Sequences
(MGTS'05 at PKDD'05, Porto, Portugal), 1-12.
ECML/PKDD'05 Organization Committee, Porto, Portugal 2005.
mgts_05.pdf (210 kb)
mgts_05.ps.gz (152 kb)
(12 pages)
- MoSS: A Program for Molecular Substructure Mining
Christian Borgelt, Thorsten Meinl, and Michael R. Berthold
Workshop Open Source Data Mining Software
(OSDM'05, Chicago, IL), 6--15.
ACM Press, New York, NY, USA 2005.
moss_ecs.pdf (224 kb)
moss_ecs.ps.gz (138 kb)
(10 pages)
- Advanced Pruning Strategies to
Speed Up Mining Closed Molecular Fragments
Christian Borgelt, Thorsten Meinl, and Michael R. Berthold.
Proc. IEEE Conf. on Systems, Man and Cybernetics
(SMC 2004, The Hague, Netherlands), on CD-ROM.
IEEE Press, Piscataway, NJ, USA 2004
smc_04.pdf (122 kb)
smc_04.ps.gz (65 kb)
(6 pages)
- Large Scale Mining of Molecular Fragments with Wildcards
Heiko Hofer, Christian Borgelt and Michael R. Berthold.
Intelligent Data Analysis 8:495-504.
IOS Press, Amsterdam, Netherlands 2004
(10 pages)
- Mining Fragments with Fuzzy Chains in Molecular Databases
Thorsten Meinl, Christian Borgelt, and Michael R. Berthold.
Proc. 2nd Int. Workshop on Mining Graphs, Trees and Sequences
(MGTS 2004, Pisa, Italy), 49-60.
University of Pisa, Pisa, Italy 2004
mgts_04.pdf (546 kb)
mgts_04.ps.gz (211 kb)
(12 pages)
- Discriminative Closed Fragment Mining
and Perfect Extensions in MoFa
Thorsten Meinl, Christian Borgelt, and Michael R. Berthold
Proc. 2nd Starting AI Researchers' Symposium
(STAIRS 2004, Valencia, Spain), 3-14
IOS Press, Amsterdam, Netherlands 2004
stairs_04.pdf (382 kb)
stairs_04.ps.gz (205 kb)
(12 pages)
- Finding Discriminative Molecular Fragments
Christian Borgelt, Heiko Hofer, and Michael Berthold
Workshop Information Mining - Navigating Large Heterogeneous Spaces
of Multimedia Information
German Conference on Artificial Intelligence,
Hamburg, Germany 2003
wsim_03.pdf (303 kb)
wsim_03.ps.gz (143 kb)
(13 pages)
- Large Scale Mining of Molecular Fragments with Wildcards
Heiko Hofer, Christian Borgelt, and Michael Berthold.
Proc. 5th International Symposium on Intelligent Data Analysis
(IDA 2003, Berlin, Germany), 380-389.
Springer-Verlag, Heidelberg, Germany 2003
ida_03.pdf (187 kb)
ida_03.ps.gz (125 kb)
(10 pages)
- Mining Molecular Fragments:
Finding Relevant Substructures of Molecules
Christian Borgelt and Michael R. Berthold
IEEE International Conference on Data Mining
(ICDM 2002, Maebashi, Japan), 51-58
IEEE Press, Piscataway, NJ, USA 2002
icdm_02.pdf (112 kb)
icdm_02.ps.gz (69 kb)
(8 pages)
Note that this program version does not support wildcard atoms and
does not have a graphical user interface as the version described in
two of the above papers. The version supporting these features is
property of Tripos, Inc.