Ads by Google
Christian Borgelt's Web Pages

Accretion and Frequent Item Set Mining

Download

accfim.pdf (1582 kb) Accretion and FIM result diagrams
accfim.zip (31 kb) accfim.tar.gz (25 kb) scripts and other source files
detect.pdf (4006 kb) surrogate filtering result diagrams
detect.zip (41 kb) detect.tar.gz (34 kb) scripts and other source files

Description

The document accfim.pdf contains the result diagrams for the complete set of experiments concerning Accretion and its possible extensions by frequent item set mining (other statistical tests, subset conditions, maximal versus closed frequent item sets etc.) that were conducted for the paper Picado-Muiño et al. 2013 referenced below. Only few of these diagrams are contained in the paper due to a lack of space. For the theory underlying the methods, please consult the paper.

The archives accfim.{zip,tar.gz} contain the scripts and other source files, with which the experiments were conducted and the document with the result diagrams was created.

The document detect.pdf contains the diagrams for the complete set of experiments concerning the surrogate-based assembly detection method suggested in the paper Picado-Muiño et al. 2013 referenced below. Only few of these diagrams are contained in the paper. This document also contains results of various pattern set reduction methods that are discussed in detail in the paper Torre et al. 2013 referenced below (but differ from the diagrams used in that paper). For the theory underlying the methods, please consult the two papers. The archives detect.{zip,tar.gz} contain the scripts and other source files, with which the experiments were conducted and the document with the result diagrams was created.

Note that the scripts etc. were developed on/for a GNU/Linux system (Ubuntu 12.10) and thus are directly executable on such a system or a similar one (that is, some other GNU/Linux distribution). Although at least most of the Python scripts should also be working on a Windows system (with the possible exception of the parallelization scripts), most of the other scripts (like the run script, which is the main control script, and the makefile, which controls generating the diagrams from the result data) may need porting to batch files or something similar.

On a GNU/Linux system, the following software needs to be installed to run the experiments:

On such a system the experiments can be run by simply calling the main script run (in the directory accfim or detect, respectively) on the command line, which does everything. The execution of the experiments exploits 4-fold parallelization, thus making full use of the quadcore processors basically all modern computers are equipped with. The progress of the experiments can be followed on the command line, to which regular progress messages are written. Once all experiments are completed (which, even on a modern computer system, can take more than 40 hours for the accfim scripts --- mainly because of the huge number of individual experimental runs, namely in the hundreds of thousands, and the high costs of Fisher's exact test used in some of them --- and about 60 minutes for the detect scripts), the result diagrams are created and compiled into the final documents, which are also directly available above.

References