Ads by Google
Christian Borgelt's Web Pages

JIM - Jaccard Item Set Mining


32 bit 64 bit (32/64 bit only for executable)
jim (282 kb) jim (296 kb) GNU/Linux executable
jim.exe (180 kb) jim.exe (212 kb) Windows console executable (168 kb) jim.tar.gz (149 kb) C sources, version 3.16 (2016.10.15) (382 kb) census data set (UCI ML repository)
census (2 kb) shell script used for the conversion


JIM is a program to find Jaccard item sets with an extension of the Eclat algorithm. In analogy to frequent item set mining, where one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions, a Jaccard item set is an item set for which the (generalized) Jaccard index of its item covers exceeds a user-specified threshold. This measure yields a much better assessment of the association strength of the items than simple support. Since the (generalized) Jaccard index is, like the support, also anti-monotone, the same basic approach can be used for the search, provided it is extended to compute the denominator of the Jaccard index.

In addition to the (generalized) Jaccard index, this program offers a large variety of other (generalized) similarity measures, which may also be used to find item sets based on cover similarity, including the measures defined by by Kulczynski, Dice, Sokal & Sneath, Sokal & Michener, Faith, Rogers & Tanimoto etc. All of these measures can also be shown to be anti-monotone.

If you have trouble executing the program on Microsoft Windows, check whether you have the Microsoft Visual C++ Redistributable Packages for Visual Studio 2015 installed, as the library was compiled with Microsoft Visual Studio 2015.

The algorithm used in this program is described in the following paper:

More information about frequent item set mining, implementations of other algorithms as well as test data sets can be found at the Frequent Itemset Mining Implementations Repository.