On Class Distributions Induced by Nearest Neighbor Graphs for Node Classification of Tabular Data

Advances in Neural Information Processing Systems

In this paper I try to address the following question: “Is it really worth to build a k-Nearest Neighbor graph from tabular data and then apply deep graph nets on top of it to classify each sample in the table?” This is something people tend to do, and I have always been skeptical about it. Now we have a very good indication, which I also tried to justify theoretically, that a k-NN graph is not the best structure to use if we want to gain some real advantages (under some mild conditions). I also argue that new methods of building synthetic graphs are necessary if we want to exploit some “latent” structure between the samples of a dataset.

Federico Errica
Federico Errica
Research Scientist

My research interests include distributed robotics, mobile computing and programmable matter.