Class Miner
- All Implemented Interfaces:
Serializable
,Runnable
- Since:
- 2002.03.11
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
flag for generating all extensionsstatic final int
flag for converting Kekulé representationsprotected moss.RepoElem[]
the repository of processed substructures (hash table)protected long
for benchmarking: canonical form pruning counterstatic final int
flag for extensions by chainsprotected long
for benchmarking: invalid chains counterstatic final int
flag for using node equivalence classesstatic final int
flag for restriction to closed fragmentsstatic final int
flag for filtering open ringsprotected CanonicalForm
the canonical form and restricted extension generatorprotected int[]
the numbers of graphs in focus and complementprotected Recoder
the recoder for the node typesprotected int
the maximum support in the complement as an absolute valueprotected double
the minimum confidence of an association rule as a fractionstatic final String
the copyright information for this programprotected NamedGraph
the current insertion point for the focusstatic final int
default search mode flags: edge extensions, embeddings, canonical form and full perfect extension pruningstatic final String
the program descriptionstatic final int
flag for directed graphs/fragmentsprotected long
for benchmarking: duplicate fragments counterstatic final int
flag for extensions by single edgesprotected long
for benchmarking: the number of comparisons with embeddingsprotected long
for benchmarking: the number of created embeddingsprotected int
the level at which to switch to embeddingsprotected EdgePatternMgr
the edge pattern manager (if needed, for rule generation)protected long
for benchmarking: equivalent frag.static final int
flag for extensions by equivalent variants of ringsprotected Graph
the node types that are excluded as seedsprotected Graph
the excluded node typesprotected double
the maximum support in the complement as a fractionprotected Fragment
the initial fragment (embedded seed structure)protected long
for benchmarking: the number of fragment comparisonsprotected long
for benchmarking: the number of created fragmentsprotected double
the minimum support in the focus as a fractionprotected NamedGraph
the list of graphs to mine (database)protected int
the group for graphs with a value below the thresholdprotected long
for benchmarking: invalid fragments counterprotected long
for benchmarking: the number of isomorphism testsprotected PrintStream
stream to write progress messages tostatic final int
flag for conversion to logic representationprotected long
for benchmarking: insufficient support pruning counterprotected int[]
the masks for nodes and edgesprotected int
the maximum size of substructures to report (number of nodes)protected int
for benchmarking: the maximum depth of the search treeprotected int
the maximum number of embeddings per graphstatic final int
flag for merging ring extensions with the same first edgeprotected int
the minimum size of substructures to report (number of nodes)protected int
the search mode flagsprotected long
for benchmarking: the number of search tree nodesprotected long
for benchmarking: non-closed fragments counterprotected CanonicalForm
the canonical form for normalizing the outputstatic final int
flag for normalized substructure outputstatic final int
flag for no search statistics outputprotected long
for benchmarking: open ring fragments counterstatic final int
flag for extension filtering with node orbitsprotected moss.GraphPattern
the list of found patterns for rule generationprotected long
for benchmarking: perfect extension pruning counterstatic final int
flag for canonical form pruningstatic final int
flag for equivalent sibling extension pruningstatic final int
flag for partial perfect extension pruningstatic final int
flag for full perfect extension pruningstatic final int
flag for pruning fragments with unclosable ringsprotected GraphReader
the graph data set file readerprotected int
the size of the repository (number of substructures)protected long
for benchmarking: the number of repository accessesprotected int
the maximum size of rings (number of nodes/edges)protected int
the minimum size of rings (number of nodes/edges)static final int
flag for extensions by ringsprotected long
for benchmarking: ring order pruning counterprotected Graph
the seed structure to start the search fromprotected int
the number of reported substructuresprotected int
the minimum support in the focus as an absolute valueprotected NamedGraph
the tail of the list of graphs (insertion point for complement)protected double
the threshold for the split into focus and complementstatic final int
flag for conversion to another description formatprotected int
the type of support to usestatic final int
flag for unembedding siblings of the current search tree nodesstatic final int
flag for verbose reportingstatic final String
the version of this programprotected Notation
the notation for verbose outputprotected Writer
the identifier file writerprotected GraphWriter
the substructure file writerprotected Writer
the graph (association) rule file writer -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
abort()
Abort the miner (if running as a thread).void
addGraph
(NamedGraph graph) Add a graph to the database.int
embed()
Embed the seed structure into all graphs.int
Get the substructures that have been found up to now.getError()
Get the error status of the search process.void
Initialize the miner from command line arguments.static void
Command line invocation of the molecular substructure miner.protected void
mine()
Preprocess the graphs, embed the seed, and start the search.protected boolean
Check and report a found fragment/substructure.protected void
rules()
Generate (association) rules from the collected patterns.void
run()
Run the miner and clean up after the search finished.void
setCnF
(CanonicalForm cnf) Set the canonical form.void
setConf
(double conf) Set the minimum rule confidence.void
setEmbed
(int level, int maxepg) Set the embeddings parameters.void
setExcluded
(String extype, String exseed, String format) Set the excluded nodes and excluded seeds.void
setExcluded
(Graph extype, Graph exseed) Set the excluded nodes and excluded seeds.void
setGrouping
(double thresh, boolean invert) Set the grouping parameters.void
Set the input reader.void
setInput
(GraphReader reader) Set the input reader.void
setLimits
(double supp, double comp) Set the support limits.void
setLog
(PrintStream stream) Sets the stream to which progress messages are written.void
setMasks
(int node, int edge, int ringnode, int ringedge) Set the node and edge masks.void
setMode
(int mode) Set the search mode.void
Set the output writer.void
Set the output writers.void
Set the output writers.void
setOutput
(GraphWriter writer) Set the output writer.void
setOutput
(GraphWriter writer, Writer wrids) Set the output writers.void
setOutput
(GraphWriter writer, Writer wrids, Writer wrules) Set the output writers.void
setRingSizes
(int min, int max) Set the minimum and maximum ring size.void
Set the seed structure to start the search from.void
Set the seed structure to start the search from.void
setSizes
(int min, int max) Set the minimum and maximum fragment size.void
setType
(int type) Set the support type.void
stats()
Print statistics about the search.protected void
term()
Clean up after the search finished or was aborted.void
Write all graphs of the database.
-
Field Details
-
DESCRIPTION
the program description- See Also:
-
VERSION
the version of this program- See Also:
-
COPYRIGHT
the copyright information for this program- See Also:
-
DIRECTED
public static final int DIRECTEDflag for directed graphs/fragments- See Also:
-
EDGEEXT
public static final int EDGEEXTflag for extensions by single edges- See Also:
-
RINGEXT
public static final int RINGEXTflag for extensions by rings- See Also:
-
CHAINEXT
public static final int CHAINEXTflag for extensions by chains- See Also:
-
EQVARS
public static final int EQVARSflag for extensions by equivalent variants of rings- See Also:
-
ORBITS
public static final int ORBITSflag for extension filtering with node orbits- See Also:
-
CLASSES
public static final int CLASSESflag for using node equivalence classes- See Also:
-
ALLEXTS
public static final int ALLEXTSflag for generating all extensions- See Also:
-
CLOSED
public static final int CLOSEDflag for restriction to closed fragments- See Also:
-
CLOSERINGS
public static final int CLOSERINGSflag for filtering open rings- See Also:
-
MERGERINGS
public static final int MERGERINGSflag for merging ring extensions with the same first edge- See Also:
-
PR_UNCLOSE
public static final int PR_UNCLOSEflag for pruning fragments with unclosable rings- See Also:
-
PR_PARTIAL
public static final int PR_PARTIALflag for partial perfect extension pruning- See Also:
-
PR_PERFECT
public static final int PR_PERFECTflag for full perfect extension pruning- See Also:
-
PR_EQUIV
public static final int PR_EQUIVflag for equivalent sibling extension pruning- See Also:
-
PR_CANONIC
public static final int PR_CANONICflag for canonical form pruning- See Also:
-
UNEMBED
public static final int UNEMBEDflag for unembedding siblings of the current search tree nodes- See Also:
-
NORMFORM
public static final int NORMFORMflag for normalized substructure output- See Also:
-
VERBOSE
public static final int VERBOSEflag for verbose reporting- See Also:
-
AROMATIZE
public static final int AROMATIZEflag for converting Kekulé representations- See Also:
-
TRANSFORM
public static final int TRANSFORMflag for conversion to another description format- See Also:
-
LOGIC
public static final int LOGICflag for conversion to logic representation- See Also:
-
NOSTATS
public static final int NOSTATSflag for no search statistics output- See Also:
-
DEFAULT
public static final int DEFAULTdefault search mode flags: edge extensions, embeddings, canonical form and full perfect extension pruning- See Also:
-
mode
protected int modethe search mode flags -
type
protected int typethe type of support to use -
fsupp
protected double fsuppthe minimum support in the focus as a fraction -
supp
protected int suppthe minimum support in the focus as an absolute value -
fcomp
protected double fcompthe maximum support in the complement as a fraction -
comp
protected int compthe maximum support in the complement as an absolute value -
conf
protected double confthe minimum confidence of an association rule as a fraction -
min
protected int minthe minimum size of substructures to report (number of nodes) -
max
protected int maxthe maximum size of substructures to report (number of nodes) -
rgmin
protected int rgminthe minimum size of rings (number of nodes/edges) -
rgmax
protected int rgmaxthe maximum size of rings (number of nodes/edges) -
masks
protected int[] masksthe masks for nodes and edges -
coder
the recoder for the node types -
seed
the seed structure to start the search from -
extype
the excluded node types -
exseed
the node types that are excluded as seeds -
graphs
the list of graphs to mine (database) -
curr
the current insertion point for the focus -
tail
the tail of the list of graphs (insertion point for complement) -
cnts
protected int[] cntsthe numbers of graphs in focus and complement -
emblvl
protected int emblvlthe level at which to switch to embeddings -
frag
the initial fragment (embedded seed structure) -
maxepg
protected int maxepgthe maximum number of embeddings per graph -
bins
protected moss.RepoElem[] binsthe repository of processed substructures (hash table) -
recnt
protected int recntthe size of the repository (number of substructures) -
epmgr
the edge pattern manager (if needed, for rule generation) -
pats
protected moss.GraphPattern patsthe list of found patterns for rule generation -
cnf
the canonical form and restricted extension generator -
norm
the canonical form for normalizing the output -
subcnt
protected int subcntthe number of reported substructures -
reader
the graph data set file reader -
thresh
protected double threshthe threshold for the split into focus and complement -
group
protected int groupthe group for graphs with a value below the threshold -
writer
the substructure file writer -
wrids
the identifier file writer -
wrules
the graph (association) rule file writer -
vntn
the notation for verbose output -
log
stream to write progress messages to -
maxdepth
protected int maxdepthfor benchmarking: the maximum depth of the search tree -
nodecnt
protected long nodecntfor benchmarking: the number of search tree nodes -
fragcnt
protected long fragcntfor benchmarking: the number of created fragments -
embcnt
protected long embcntfor benchmarking: the number of created embeddings -
lowsupp
protected long lowsuppfor benchmarking: insufficient support pruning counter -
perfect
protected long perfectfor benchmarking: perfect extension pruning counter -
equiv
protected long equivfor benchmarking: equivalent frag. pruning counter -
ringord
protected long ringordfor benchmarking: ring order pruning counter -
canonic
protected long canonicfor benchmarking: canonical form pruning counter -
duplic
protected long duplicfor benchmarking: duplicate fragments counter -
nonclsd
protected long nonclsdfor benchmarking: non-closed fragments counter -
openrgs
protected long openrgsfor benchmarking: open ring fragments counter -
chains
protected long chainsfor benchmarking: invalid chains counter -
invalid
protected long invalidfor benchmarking: invalid fragments counter -
repcnt
protected long repcntfor benchmarking: the number of repository accesses -
fragcmp
protected long fragcmpfor benchmarking: the number of fragment comparisons -
isocnt
protected long isocntfor benchmarking: the number of isomorphism tests -
embcmp
protected long embcmpfor benchmarking: the number of comparisons with embeddings
-
-
Constructor Details
-
Miner
public Miner()Create an empty miner with default parameter settings.- Since:
- 2002.03.11 (Christian Borgelt)
-
-
Method Details
-
setMode
public void setMode(int mode) Set the search mode.The search mode is a combination of the search mode flags, e.g.
RINGEXT
orPR_CANONIC
.- Parameters:
mode
- the search mode- Since:
- 2006.10.26 (Christian Borgelt)
-
setSizes
public void setSizes(int min, int max) Set the minimum and maximum fragment size.- Parameters:
min
- the minimum fragment size (number of nodes)max
- the maximum fragment size (number of nodes)- Since:
- 2006.10.26 (Christian Borgelt)
-
setType
public void setType(int type) Set the support type.Constants for support types are defined in the class
Fragment
.- Parameters:
type
- the support type to use- Since:
- 2006.06.21 (Christian Borgelt)
- See Also:
-
setLimits
public void setLimits(double supp, double comp) Set the support limits.Positive values are fractions of the focus or complement set, negative values are absolute numbers.
- Parameters:
supp
- the minimum support in the focuscomp
- the maximum support in the complement- Since:
- 2006.10.26 (Christian Borgelt)
-
setConf
public void setConf(double conf) Set the minimum rule confidence.- Parameters:
conf
- the minimum confidence of association rules- Since:
- 2020.10.04 (Christian Borgelt)
-
setRingSizes
public void setRingSizes(int min, int max) Set the minimum and maximum ring size.- Parameters:
min
- the minimum ring size (number of nodes/edges)max
- the maximum ring size (number of nodes/edges)- Since:
- 2006.10.26 (Christian Borgelt)
-
setMasks
public void setMasks(int node, int edge, int ringnode, int ringedge) Set the node and edge masks.- Parameters:
node
- the mask for nodes outside (marked) ringsedge
- the mask for edges outside (marked) ringsringnode
- the mask for nodes in (marked) ringsringedge
- the mask for edges in (marked) rings- Since:
- 2006.06.26 (Christian Borgelt)
-
setEmbed
public void setEmbed(int level, int maxepg) Set the embeddings parameters.Restricting the maximum number of embeddings per graph can reduce the amount of memory needed in the search, but slows down the operation (sometimes considerably).
- Parameters:
level
- the level at which to switch to embeddingsmaxepg
- the maximum number of embeddings per graph- Since:
- 2010.01.27 (Christian Borgelt)
-
setExcluded
Set the excluded nodes and excluded seeds.Excluded nodes are completely removed from the search, that is, no substructure containing such an node will be reported. Nodes that are only excluded as seeds may appear in reported fragments, but are not used as seeds. This can be useful, for example, in the case where carbon is the most frequent element and one is not interested in fragments containing only carbon nodes.
- Parameters:
extype
- the node types to exclude from the searchexseed
- the node types to exclude as seeds- Since:
- 2006.06.26 (Christian Borgelt)
-
setExcluded
Set the excluded nodes and excluded seeds.The arguments
exat
andexsd
are parsed as graph descriptions in the notation given by the argumentformat
.- Parameters:
extype
- the description of the excluded nodesexseed
- the description of the nodes to exclude as seedsformat
- the format of the descriptions- Throws:
IOException
- if writing the log file failed- Since:
- 2006.06.26 (Christian Borgelt)
-
setSeed
Set the seed structure to start the search from.- Parameters:
seed
- the seed structure for the search- Throws:
IOException
- if the seed is not connected- Since:
- 2006.06.26 (Christian Borgelt)
-
setSeed
Set the seed structure to start the search from.The argument
desc
is parsed as graph description in the notation given by the argumentformat
.- Parameters:
desc
- the description of the seed structureformat
- the format of the seed description- Throws:
IOException
- if the seed is not connected or writing the log file failed- Since:
- 2006.06.26 (Christian Borgelt)
-
setGrouping
public void setGrouping(double thresh, boolean invert) Set the grouping parameters.If
invert == false
, all graphs having an associated value smaller than the thresholdthresh
are placed into the focus and all other graphs are the complement. Ifinvert == true
, this split is inverted, that is, all graphs having an associated value no less than the thresholdthresh
are placed into the focus and all other graphs are the complement.- Parameters:
thresh
- the threshold for the groupinginvert
- whether to invert the grouping- Since:
- 2007.03.05 (Christian Borgelt)
-
setLog
Sets the stream to which progress messages are written.By default all messages are written to
System.err
.- Parameters:
stream
- the stream to write to- Since:
- 2007.05.30 (Christian Borgelt)
-
setInput
Set the input reader.- Parameters:
reader
- the reader from which to read the graphs- Since:
- 2007.03.05 (Christian Borgelt)
-
setInput
Set the input reader.- Parameters:
fname
- the name of the input data fileformat
- the format of the input data- Throws:
IOException
- if creating the input reader failed- Since:
- 2007.03.05 (Christian Borgelt)
-
setOutput
Set the output writer.- Parameters:
writer
- the writer to write the found substructures- Since:
- 2002.03.11 (Christian Borgelt)
-
setOutput
Set the output writers.- Parameters:
writer
- the writer to write the found substructureswrids
- the writer to write the graph identifiers- Since:
- 2002.03.11 (Christian Borgelt)
-
setOutput
Set the output writers.- Parameters:
writer
- the writer to write the found substructureswrids
- the writer to write the graph identifierswrules
- the writer to write the graph rules- Since:
- 2020.10.05 (Christian Borgelt)
-
setOutput
Set the output writer.- Parameters:
fname
- the name of the file for the found substructuresformat
- the format for the output- Throws:
IOException
- if creating the output writer failed- Since:
- 2007.07.01 (Christian Borgelt)
-
setOutput
Set the output writers.- Parameters:
fn_sub
- the name of the file for the found fragmentsformat
- the format for the outputfn_ids
- the name of the file for the graph identifiers- Throws:
IOException
- if creating an output writer failed- Since:
- 2006.06.26 (Christian Borgelt)
-
setOutput
public void setOutput(String fn_sub, String format, String fn_ids, String fn_rules) throws IOException Set the output writers.- Parameters:
fn_sub
- the name of the file for the found fragmentsformat
- the format for the outputfn_ids
- the name of the file for the graph identifiersfn_rules
- the name of the file for the graph rules- Throws:
IOException
- if creating an output writer failed- Since:
- 2006.06.26 (Christian Borgelt)
-
setCnF
Set the canonical form.- Parameters:
cnf
- the canonical form to set- Since:
- 2009.08.04 (Christian Borgelt)
-
addGraph
Add a graph to the database.When the graph is added, its group is evaluated and it is added to the list in such a way that all focus graphs are at the beginning of the list and all complement graphs at the end. Hence the group of a graph must not be changed after it has been added to a miner. Note that the order in which the graphs are added is preserved in the focus and the complement lists.
- Parameters:
graph
- the graph to add- Since:
- 2002.03.11 (Christian Borgelt)
-
embed
public int embed()Embed the seed structure into all graphs.- Returns:
- the number of graphs that contain the seed
- Since:
- 2002.03.11 (Christian Borgelt)
-
report
Check and report a found fragment/substructure.In order to be actually reported (written to the output file), the fragment must be valid (
Fragment.isValid()
), meet the maximum support requirement for the complement part of the database, be closed (Fragment.isClosed()
) and must not have open rings if only fragments with closed rings are to be reported.- Parameters:
frag
- the fragment to report- Returns:
- whether the fragment has been reported
- Throws:
IOException
- if a file operation failed- Since:
- 2002.03.21 (Christian Borgelt)
-
writeGraphs
Write all graphs of the database.- Throws:
IOException
- if writing the graphs failed- Since:
- 2002.03.11 (Christian Borgelt)
-
init
Initialize the miner from command line arguments.- Parameters:
args
- the command line arguments- Throws:
IOException
- if some file operation failed- Since:
- 2006.03.01 (Christian Borgelt)
-
mine
Preprocess the graphs, embed the seed, and start the search.- Throws:
IOException
- if some file operation failed- Since:
- 2006.03.01 (Christian Borgelt)
-
rules
Generate (association) rules from the collected patterns.- Throws:
IOException
- if some file operation failed- Since:
- 2020.10.03 (Christian Borgelt)
-
term
Clean up after the search finished or was aborted.- Throws:
IOException
- if some file operation failed- Since:
- 2006.03.01 (Christian Borgelt)
-
run
public void run()Run the miner and clean up after the search finished. -
abort
public void abort()Abort the miner (if running as a thread).- Since:
- 2006.03.01 (Christian Borgelt)
-
getCurrent
public int getCurrent()Get the substructures that have been found up to now.This function enables progress reporting by another thread. It is used in the graphical user interface (class
MoSS
).If the return value is negative, it indicates the number of graphs that have been loaded, otherwise the number of substructures that have been found.
- Returns:
- the number of loaded graphs (if non-negative) or the number of found substructures (if negative)
- Since:
- 2006.03.01 (Christian Borgelt)
-
getError
Get the error status of the search process.With this function it can be checked, after the search with the
run()
method has terminated, whether an error occurred in the search. Note that an external abort with the functionabort()
does not trigger an exception to be thrown.- Returns:
- the exception that occurred in the search
or
null
if the search was successful - Since:
- 2007.03.05 (Christian Borgelt)
-
stats
public void stats()Print statistics about the search.- Since:
- 2006.03.01 (Christian Borgelt)
-
main
Command line invocation of the molecular substructure miner.- Parameters:
args
- the command line arguments- Since:
- 2002.03.15 (Christian Borgelt)
-