Ads by Google
Christian Borgelt's Web Pages

Hubness - Analysis of the Hubness Phenomenon

Download

hubness (42 kb) GNU/Linux executable
hubness.exe (175 kb) Windows console executable
hubness.zip (26 kb) C sources, version 1.7 (2014.10.24)
hubness.tar.gz (21 kb)

Description

Hubness is a program to analyze the hubness phenomenon, which consists in the observation that for increasing dimensionality of a data set the distribution of the number of times a data point occurs among the k nearest neighbors of other data points becomes increasingly skewed to the right. As a consequence, so-called hubs emerge, that is, data points that appear in the lists of the k nearest neighbors of other data points much more often than others. With this program (and accompanying Python scripts, which can be found in the source package), the experiments were executed that are reported in the paper referenced below. This paper challenges the hypothesis that the hubness phenomenon is an effect of the dimensionality of the data set and provides evidence that it is rather a boundary effect or, more generally, an effect of a density gradient. As such, it may be seen as an artifact that results from the process in which the data is generated that is used to demonstrate this phenomenon.

Call the program with no arguments to see a list of options.

If you have trouble executing the program on Microsoft Windows, check whether you have the Microsoft Visual C++ Redistributable for Visual Studio 2022 (see under "Other Tools and Frameworks") installed, as the program was compiled with Microsoft Visual Studio 2022.

Check the directory "hubness/ex" in the source package for Python scripts that were used to execute the experiments reported in this paper: