Package moss

Class Miner

java.lang.Object
moss.Miner
All Implemented Interfaces:
Serializable, Runnable

public class Miner extends Object implements Runnable, Serializable
Class for the molecular substructure miner.
Since:
2002.03.11
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
    flag for generating all extensions
    static final int
    flag for converting Kekulé representations
    protected moss.RepoElem[]
    the repository of processed substructures (hash table)
    protected long
    for benchmarking: canonical form pruning counter
    static final int
    flag for extensions by chains
    protected long
    for benchmarking: invalid chains counter
    static final int
    flag for using node equivalence classes
    static final int
    flag for restriction to closed fragments
    static final int
    flag for filtering open rings
    protected CanonicalForm
    the canonical form and restricted extension generator
    protected int[]
    the numbers of graphs in focus and complement
    protected Recoder
    the recoder for the node types
    protected int
    the maximum support in the complement as an absolute value
    protected double
    the minimum confidence of an association rule as a fraction
    static final String
    the copyright information for this program
    protected NamedGraph
    the current insertion point for the focus
    static final int
    default search mode flags: edge extensions, embeddings, canonical form and full perfect extension pruning
    static final String
    the program description
    static final int
    flag for directed graphs/fragments
    protected long
    for benchmarking: duplicate fragments counter
    static final int
    flag for extensions by single edges
    protected long
    for benchmarking: the number of comparisons with embeddings
    protected long
    for benchmarking: the number of created embeddings
    protected int
    the level at which to switch to embeddings
    protected EdgePatternMgr
    the edge pattern manager (if needed, for rule generation)
    protected long
    for benchmarking: equivalent frag.
    static final int
    flag for extensions by equivalent variants of rings
    protected Graph
    the node types that are excluded as seeds
    protected Graph
    the excluded node types
    protected double
    the maximum support in the complement as a fraction
    protected Fragment
    the initial fragment (embedded seed structure)
    protected long
    for benchmarking: the number of fragment comparisons
    protected long
    for benchmarking: the number of created fragments
    protected double
    the minimum support in the focus as a fraction
    protected NamedGraph
    the list of graphs to mine (database)
    protected int
    the group for graphs with a value below the threshold
    protected long
    for benchmarking: invalid fragments counter
    protected long
    for benchmarking: the number of isomorphism tests
    protected PrintStream
    stream to write progress messages to
    static final int
    flag for conversion to logic representation
    protected long
    for benchmarking: insufficient support pruning counter
    protected int[]
    the masks for nodes and edges
    protected int
    the maximum size of substructures to report (number of nodes)
    protected int
    for benchmarking: the maximum depth of the search tree
    protected int
    the maximum number of embeddings per graph
    static final int
    flag for merging ring extensions with the same first edge
    protected int
    the minimum size of substructures to report (number of nodes)
    protected int
    the search mode flags
    protected long
    for benchmarking: the number of search tree nodes
    protected long
    for benchmarking: non-closed fragments counter
    protected CanonicalForm
    the canonical form for normalizing the output
    static final int
    flag for normalized substructure output
    static final int
    flag for no search statistics output
    protected long
    for benchmarking: open ring fragments counter
    static final int
    flag for extension filtering with node orbits
    protected moss.GraphPattern
    the list of found patterns for rule generation
    protected long
    for benchmarking: perfect extension pruning counter
    static final int
    flag for canonical form pruning
    static final int
    flag for equivalent sibling extension pruning
    static final int
    flag for partial perfect extension pruning
    static final int
    flag for full perfect extension pruning
    static final int
    flag for pruning fragments with unclosable rings
    protected GraphReader
    the graph data set file reader
    protected int
    the size of the repository (number of substructures)
    protected long
    for benchmarking: the number of repository accesses
    protected int
    the maximum size of rings (number of nodes/edges)
    protected int
    the minimum size of rings (number of nodes/edges)
    static final int
    flag for extensions by rings
    protected long
    for benchmarking: ring order pruning counter
    protected Graph
    the seed structure to start the search from
    protected int
    the number of reported substructures
    protected int
    the minimum support in the focus as an absolute value
    protected NamedGraph
    the tail of the list of graphs (insertion point for complement)
    protected double
    the threshold for the split into focus and complement
    static final int
    flag for conversion to another description format
    protected int
    the type of support to use
    static final int
    flag for unembedding siblings of the current search tree nodes
    static final int
    flag for verbose reporting
    static final String
    the version of this program
    protected Notation
    the notation for verbose output
    protected Writer
    the identifier file writer
    protected GraphWriter
    the substructure file writer
    protected Writer
    the graph (association) rule file writer
  • Constructor Summary

    Constructors
    Constructor
    Description
    Create an empty miner with default parameter settings.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Abort the miner (if running as a thread).
    void
    Add a graph to the database.
    int
    Embed the seed structure into all graphs.
    int
    Get the substructures that have been found up to now.
    Get the error status of the search process.
    void
    init(String[] args)
    Initialize the miner from command line arguments.
    static void
    main(String[] args)
    Command line invocation of the molecular substructure miner.
    protected void
    Preprocess the graphs, embed the seed, and start the search.
    protected boolean
    Check and report a found fragment/substructure.
    protected void
    Generate (association) rules from the collected patterns.
    void
    run()
    Run the miner and clean up after the search finished.
    void
    Set the canonical form.
    void
    setConf(double conf)
    Set the minimum rule confidence.
    void
    setEmbed(int level, int maxepg)
    Set the embeddings parameters.
    void
    setExcluded(String extype, String exseed, String format)
    Set the excluded nodes and excluded seeds.
    void
    setExcluded(Graph extype, Graph exseed)
    Set the excluded nodes and excluded seeds.
    void
    setGrouping(double thresh, boolean invert)
    Set the grouping parameters.
    void
    setInput(String fname, String format)
    Set the input reader.
    void
    Set the input reader.
    void
    setLimits(double supp, double comp)
    Set the support limits.
    void
    Sets the stream to which progress messages are written.
    void
    setMasks(int node, int edge, int ringnode, int ringedge)
    Set the node and edge masks.
    void
    setMode(int mode)
    Set the search mode.
    void
    setOutput(String fname, String format)
    Set the output writer.
    void
    setOutput(String fn_sub, String format, String fn_ids)
    Set the output writers.
    void
    setOutput(String fn_sub, String format, String fn_ids, String fn_rules)
    Set the output writers.
    void
    Set the output writer.
    void
    setOutput(GraphWriter writer, Writer wrids)
    Set the output writers.
    void
    setOutput(GraphWriter writer, Writer wrids, Writer wrules)
    Set the output writers.
    void
    setRingSizes(int min, int max)
    Set the minimum and maximum ring size.
    void
    setSeed(String desc, String format)
    Set the seed structure to start the search from.
    void
    setSeed(Graph seed)
    Set the seed structure to start the search from.
    void
    setSizes(int min, int max)
    Set the minimum and maximum fragment size.
    void
    setType(int type)
    Set the support type.
    void
    Print statistics about the search.
    protected void
    Clean up after the search finished or was aborted.
    void
    Write all graphs of the database.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DESCRIPTION

      public static final String DESCRIPTION
      the program description
      See Also:
    • VERSION

      public static final String VERSION
      the version of this program
      See Also:
    • DIRECTED

      public static final int DIRECTED
      flag for directed graphs/fragments
      See Also:
    • EDGEEXT

      public static final int EDGEEXT
      flag for extensions by single edges
      See Also:
    • RINGEXT

      public static final int RINGEXT
      flag for extensions by rings
      See Also:
    • CHAINEXT

      public static final int CHAINEXT
      flag for extensions by chains
      See Also:
    • EQVARS

      public static final int EQVARS
      flag for extensions by equivalent variants of rings
      See Also:
    • ORBITS

      public static final int ORBITS
      flag for extension filtering with node orbits
      See Also:
    • CLASSES

      public static final int CLASSES
      flag for using node equivalence classes
      See Also:
    • ALLEXTS

      public static final int ALLEXTS
      flag for generating all extensions
      See Also:
    • CLOSED

      public static final int CLOSED
      flag for restriction to closed fragments
      See Also:
    • CLOSERINGS

      public static final int CLOSERINGS
      flag for filtering open rings
      See Also:
    • MERGERINGS

      public static final int MERGERINGS
      flag for merging ring extensions with the same first edge
      See Also:
    • PR_UNCLOSE

      public static final int PR_UNCLOSE
      flag for pruning fragments with unclosable rings
      See Also:
    • PR_PARTIAL

      public static final int PR_PARTIAL
      flag for partial perfect extension pruning
      See Also:
    • PR_PERFECT

      public static final int PR_PERFECT
      flag for full perfect extension pruning
      See Also:
    • PR_EQUIV

      public static final int PR_EQUIV
      flag for equivalent sibling extension pruning
      See Also:
    • PR_CANONIC

      public static final int PR_CANONIC
      flag for canonical form pruning
      See Also:
    • UNEMBED

      public static final int UNEMBED
      flag for unembedding siblings of the current search tree nodes
      See Also:
    • NORMFORM

      public static final int NORMFORM
      flag for normalized substructure output
      See Also:
    • VERBOSE

      public static final int VERBOSE
      flag for verbose reporting
      See Also:
    • AROMATIZE

      public static final int AROMATIZE
      flag for converting Kekulé representations
      See Also:
    • TRANSFORM

      public static final int TRANSFORM
      flag for conversion to another description format
      See Also:
    • LOGIC

      public static final int LOGIC
      flag for conversion to logic representation
      See Also:
    • NOSTATS

      public static final int NOSTATS
      flag for no search statistics output
      See Also:
    • DEFAULT

      public static final int DEFAULT
      default search mode flags: edge extensions, embeddings, canonical form and full perfect extension pruning
      See Also:
    • mode

      protected int mode
      the search mode flags
    • type

      protected int type
      the type of support to use
    • fsupp

      protected double fsupp
      the minimum support in the focus as a fraction
    • supp

      protected int supp
      the minimum support in the focus as an absolute value
    • fcomp

      protected double fcomp
      the maximum support in the complement as a fraction
    • comp

      protected int comp
      the maximum support in the complement as an absolute value
    • conf

      protected double conf
      the minimum confidence of an association rule as a fraction
    • min

      protected int min
      the minimum size of substructures to report (number of nodes)
    • max

      protected int max
      the maximum size of substructures to report (number of nodes)
    • rgmin

      protected int rgmin
      the minimum size of rings (number of nodes/edges)
    • rgmax

      protected int rgmax
      the maximum size of rings (number of nodes/edges)
    • masks

      protected int[] masks
      the masks for nodes and edges
    • coder

      protected Recoder coder
      the recoder for the node types
    • seed

      protected Graph seed
      the seed structure to start the search from
    • extype

      protected Graph extype
      the excluded node types
    • exseed

      protected Graph exseed
      the node types that are excluded as seeds
    • graphs

      protected NamedGraph graphs
      the list of graphs to mine (database)
    • curr

      protected NamedGraph curr
      the current insertion point for the focus
    • tail

      protected NamedGraph tail
      the tail of the list of graphs (insertion point for complement)
    • cnts

      protected int[] cnts
      the numbers of graphs in focus and complement
    • emblvl

      protected int emblvl
      the level at which to switch to embeddings
    • frag

      protected Fragment frag
      the initial fragment (embedded seed structure)
    • maxepg

      protected int maxepg
      the maximum number of embeddings per graph
    • bins

      protected moss.RepoElem[] bins
      the repository of processed substructures (hash table)
    • recnt

      protected int recnt
      the size of the repository (number of substructures)
    • epmgr

      protected EdgePatternMgr epmgr
      the edge pattern manager (if needed, for rule generation)
    • pats

      protected moss.GraphPattern pats
      the list of found patterns for rule generation
    • cnf

      protected CanonicalForm cnf
      the canonical form and restricted extension generator
    • norm

      protected CanonicalForm norm
      the canonical form for normalizing the output
    • subcnt

      protected int subcnt
      the number of reported substructures
    • reader

      protected GraphReader reader
      the graph data set file reader
    • thresh

      protected double thresh
      the threshold for the split into focus and complement
    • group

      protected int group
      the group for graphs with a value below the threshold
    • writer

      protected GraphWriter writer
      the substructure file writer
    • wrids

      protected transient Writer wrids
      the identifier file writer
    • wrules

      protected transient Writer wrules
      the graph (association) rule file writer
    • vntn

      protected Notation vntn
      the notation for verbose output
    • log

      protected transient PrintStream log
      stream to write progress messages to
    • maxdepth

      protected int maxdepth
      for benchmarking: the maximum depth of the search tree
    • nodecnt

      protected long nodecnt
      for benchmarking: the number of search tree nodes
    • fragcnt

      protected long fragcnt
      for benchmarking: the number of created fragments
    • embcnt

      protected long embcnt
      for benchmarking: the number of created embeddings
    • lowsupp

      protected long lowsupp
      for benchmarking: insufficient support pruning counter
    • perfect

      protected long perfect
      for benchmarking: perfect extension pruning counter
    • equiv

      protected long equiv
      for benchmarking: equivalent frag. pruning counter
    • ringord

      protected long ringord
      for benchmarking: ring order pruning counter
    • canonic

      protected long canonic
      for benchmarking: canonical form pruning counter
    • duplic

      protected long duplic
      for benchmarking: duplicate fragments counter
    • nonclsd

      protected long nonclsd
      for benchmarking: non-closed fragments counter
    • openrgs

      protected long openrgs
      for benchmarking: open ring fragments counter
    • chains

      protected long chains
      for benchmarking: invalid chains counter
    • invalid

      protected long invalid
      for benchmarking: invalid fragments counter
    • repcnt

      protected long repcnt
      for benchmarking: the number of repository accesses
    • fragcmp

      protected long fragcmp
      for benchmarking: the number of fragment comparisons
    • isocnt

      protected long isocnt
      for benchmarking: the number of isomorphism tests
    • embcmp

      protected long embcmp
      for benchmarking: the number of comparisons with embeddings
  • Constructor Details

    • Miner

      public Miner()
      Create an empty miner with default parameter settings.
      Since:
      2002.03.11 (Christian Borgelt)
  • Method Details

    • setMode

      public void setMode(int mode)
      Set the search mode.

      The search mode is a combination of the search mode flags, e.g. RINGEXT or PR_CANONIC.

      Parameters:
      mode - the search mode
      Since:
      2006.10.26 (Christian Borgelt)
    • setSizes

      public void setSizes(int min, int max)
      Set the minimum and maximum fragment size.
      Parameters:
      min - the minimum fragment size (number of nodes)
      max - the maximum fragment size (number of nodes)
      Since:
      2006.10.26 (Christian Borgelt)
    • setType

      public void setType(int type)
      Set the support type.

      Constants for support types are defined in the class Fragment.

      Parameters:
      type - the support type to use
      Since:
      2006.06.21 (Christian Borgelt)
      See Also:
    • setLimits

      public void setLimits(double supp, double comp)
      Set the support limits.

      Positive values are fractions of the focus or complement set, negative values are absolute numbers.

      Parameters:
      supp - the minimum support in the focus
      comp - the maximum support in the complement
      Since:
      2006.10.26 (Christian Borgelt)
    • setConf

      public void setConf(double conf)
      Set the minimum rule confidence.
      Parameters:
      conf - the minimum confidence of association rules
      Since:
      2020.10.04 (Christian Borgelt)
    • setRingSizes

      public void setRingSizes(int min, int max)
      Set the minimum and maximum ring size.
      Parameters:
      min - the minimum ring size (number of nodes/edges)
      max - the maximum ring size (number of nodes/edges)
      Since:
      2006.10.26 (Christian Borgelt)
    • setMasks

      public void setMasks(int node, int edge, int ringnode, int ringedge)
      Set the node and edge masks.
      Parameters:
      node - the mask for nodes outside (marked) rings
      edge - the mask for edges outside (marked) rings
      ringnode - the mask for nodes in (marked) rings
      ringedge - the mask for edges in (marked) rings
      Since:
      2006.06.26 (Christian Borgelt)
    • setEmbed

      public void setEmbed(int level, int maxepg)
      Set the embeddings parameters.

      Restricting the maximum number of embeddings per graph can reduce the amount of memory needed in the search, but slows down the operation (sometimes considerably).

      Parameters:
      level - the level at which to switch to embeddings
      maxepg - the maximum number of embeddings per graph
      Since:
      2010.01.27 (Christian Borgelt)
    • setExcluded

      public void setExcluded(Graph extype, Graph exseed)
      Set the excluded nodes and excluded seeds.

      Excluded nodes are completely removed from the search, that is, no substructure containing such an node will be reported. Nodes that are only excluded as seeds may appear in reported fragments, but are not used as seeds. This can be useful, for example, in the case where carbon is the most frequent element and one is not interested in fragments containing only carbon nodes.

      Parameters:
      extype - the node types to exclude from the search
      exseed - the node types to exclude as seeds
      Since:
      2006.06.26 (Christian Borgelt)
    • setExcluded

      public void setExcluded(String extype, String exseed, String format) throws IOException
      Set the excluded nodes and excluded seeds.

      The arguments exat and exsd are parsed as graph descriptions in the notation given by the argument format.

      Parameters:
      extype - the description of the excluded nodes
      exseed - the description of the nodes to exclude as seeds
      format - the format of the descriptions
      Throws:
      IOException - if writing the log file failed
      Since:
      2006.06.26 (Christian Borgelt)
    • setSeed

      public void setSeed(Graph seed) throws IOException
      Set the seed structure to start the search from.
      Parameters:
      seed - the seed structure for the search
      Throws:
      IOException - if the seed is not connected
      Since:
      2006.06.26 (Christian Borgelt)
    • setSeed

      public void setSeed(String desc, String format) throws IOException
      Set the seed structure to start the search from.

      The argument desc is parsed as graph description in the notation given by the argument format.

      Parameters:
      desc - the description of the seed structure
      format - the format of the seed description
      Throws:
      IOException - if the seed is not connected or writing the log file failed
      Since:
      2006.06.26 (Christian Borgelt)
    • setGrouping

      public void setGrouping(double thresh, boolean invert)
      Set the grouping parameters.

      If invert == false, all graphs having an associated value smaller than the threshold thresh are placed into the focus and all other graphs are the complement. If invert == true, this split is inverted, that is, all graphs having an associated value no less than the threshold thresh are placed into the focus and all other graphs are the complement.

      Parameters:
      thresh - the threshold for the grouping
      invert - whether to invert the grouping
      Since:
      2007.03.05 (Christian Borgelt)
    • setLog

      public void setLog(PrintStream stream)
      Sets the stream to which progress messages are written.

      By default all messages are written to System.err.

      Parameters:
      stream - the stream to write to
      Since:
      2007.05.30 (Christian Borgelt)
    • setInput

      public void setInput(GraphReader reader)
      Set the input reader.
      Parameters:
      reader - the reader from which to read the graphs
      Since:
      2007.03.05 (Christian Borgelt)
    • setInput

      public void setInput(String fname, String format) throws IOException
      Set the input reader.
      Parameters:
      fname - the name of the input data file
      format - the format of the input data
      Throws:
      IOException - if creating the input reader failed
      Since:
      2007.03.05 (Christian Borgelt)
    • setOutput

      public void setOutput(GraphWriter writer)
      Set the output writer.
      Parameters:
      writer - the writer to write the found substructures
      Since:
      2002.03.11 (Christian Borgelt)
    • setOutput

      public void setOutput(GraphWriter writer, Writer wrids)
      Set the output writers.
      Parameters:
      writer - the writer to write the found substructures
      wrids - the writer to write the graph identifiers
      Since:
      2002.03.11 (Christian Borgelt)
    • setOutput

      public void setOutput(GraphWriter writer, Writer wrids, Writer wrules)
      Set the output writers.
      Parameters:
      writer - the writer to write the found substructures
      wrids - the writer to write the graph identifiers
      wrules - the writer to write the graph rules
      Since:
      2020.10.05 (Christian Borgelt)
    • setOutput

      public void setOutput(String fname, String format) throws IOException
      Set the output writer.
      Parameters:
      fname - the name of the file for the found substructures
      format - the format for the output
      Throws:
      IOException - if creating the output writer failed
      Since:
      2007.07.01 (Christian Borgelt)
    • setOutput

      public void setOutput(String fn_sub, String format, String fn_ids) throws IOException
      Set the output writers.
      Parameters:
      fn_sub - the name of the file for the found fragments
      format - the format for the output
      fn_ids - the name of the file for the graph identifiers
      Throws:
      IOException - if creating an output writer failed
      Since:
      2006.06.26 (Christian Borgelt)
    • setOutput

      public void setOutput(String fn_sub, String format, String fn_ids, String fn_rules) throws IOException
      Set the output writers.
      Parameters:
      fn_sub - the name of the file for the found fragments
      format - the format for the output
      fn_ids - the name of the file for the graph identifiers
      fn_rules - the name of the file for the graph rules
      Throws:
      IOException - if creating an output writer failed
      Since:
      2006.06.26 (Christian Borgelt)
    • setCnF

      public void setCnF(CanonicalForm cnf)
      Set the canonical form.
      Parameters:
      cnf - the canonical form to set
      Since:
      2009.08.04 (Christian Borgelt)
    • addGraph

      public void addGraph(NamedGraph graph)
      Add a graph to the database.

      When the graph is added, its group is evaluated and it is added to the list in such a way that all focus graphs are at the beginning of the list and all complement graphs at the end. Hence the group of a graph must not be changed after it has been added to a miner. Note that the order in which the graphs are added is preserved in the focus and the complement lists.

      Parameters:
      graph - the graph to add
      Since:
      2002.03.11 (Christian Borgelt)
    • embed

      public int embed()
      Embed the seed structure into all graphs.
      Returns:
      the number of graphs that contain the seed
      Since:
      2002.03.11 (Christian Borgelt)
    • report

      protected boolean report(Fragment frag) throws IOException
      Check and report a found fragment/substructure.

      In order to be actually reported (written to the output file), the fragment must be valid (Fragment.isValid()), meet the maximum support requirement for the complement part of the database, be closed (Fragment.isClosed()) and must not have open rings if only fragments with closed rings are to be reported.

      Parameters:
      frag - the fragment to report
      Returns:
      whether the fragment has been reported
      Throws:
      IOException - if a file operation failed
      Since:
      2002.03.21 (Christian Borgelt)
    • writeGraphs

      public void writeGraphs() throws IOException
      Write all graphs of the database.
      Throws:
      IOException - if writing the graphs failed
      Since:
      2002.03.11 (Christian Borgelt)
    • init

      public void init(String[] args) throws IOException
      Initialize the miner from command line arguments.
      Parameters:
      args - the command line arguments
      Throws:
      IOException - if some file operation failed
      Since:
      2006.03.01 (Christian Borgelt)
    • mine

      protected void mine() throws IOException
      Preprocess the graphs, embed the seed, and start the search.
      Throws:
      IOException - if some file operation failed
      Since:
      2006.03.01 (Christian Borgelt)
    • rules

      protected void rules() throws IOException
      Generate (association) rules from the collected patterns.
      Throws:
      IOException - if some file operation failed
      Since:
      2020.10.03 (Christian Borgelt)
    • term

      protected void term() throws IOException
      Clean up after the search finished or was aborted.
      Throws:
      IOException - if some file operation failed
      Since:
      2006.03.01 (Christian Borgelt)
    • run

      public void run()
      Run the miner and clean up after the search finished.
      Specified by:
      run in interface Runnable
      Since:
      2006.03.01 (Christian Borgelt)
    • abort

      public void abort()
      Abort the miner (if running as a thread).
      Since:
      2006.03.01 (Christian Borgelt)
    • getCurrent

      public int getCurrent()
      Get the substructures that have been found up to now.

      This function enables progress reporting by another thread. It is used in the graphical user interface (class MoSS).

      If the return value is negative, it indicates the number of graphs that have been loaded, otherwise the number of substructures that have been found.

      Returns:
      the number of loaded graphs (if non-negative) or the number of found substructures (if negative)
      Since:
      2006.03.01 (Christian Borgelt)
    • getError

      public Throwable getError()
      Get the error status of the search process.

      With this function it can be checked, after the search with the run() method has terminated, whether an error occurred in the search. Note that an external abort with the function abort() does not trigger an exception to be thrown.

      Returns:
      the exception that occurred in the search or null if the search was successful
      Since:
      2007.03.05 (Christian Borgelt)
    • stats

      public void stats()
      Print statistics about the search.
      Since:
      2006.03.01 (Christian Borgelt)
    • main

      public static void main(String[] args)
      Command line invocation of the molecular substructure miner.
      Parameters:
      args - the command line arguments
      Since:
      2002.03.15 (Christian Borgelt)