|
Archive@NYU >
Stern School of Business >
CeDER Published Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/27814
|
| Title: | Classification in Networked Data: A Toolkit and a Univariate Case Study |
| Authors: | Mcskassy, Sofus Provost, Foster |
| Keywords: | relational learning network learning collective inference collective classification networked data probabilistic relational models network analysis network data |
| Issue Date: | May-2007 |
| Publisher: | Journal of Machine Learning Research |
| Citation: | Journal of Machine Learning Research 8(May):935--983, 200 |
| Series/Report no.: | CeDER-PP-2007-07 |
| Abstract: | This paper1 is about classifying entities that are interlinked with
entities for which the class is known. After surveying prior work, we
present NetKit, a modular toolkit for classification in networked data,
and a case-study of its application to networked data used in prior
machine learning research. NetKit is based on a node-centric framework
in which classifiers comprise a local classifier, a relational
classifier, and a collective inference procedure. Various existing
node-centric relational learning algorithms can be instantiated with
appropriate choices for these components, and new combinations of
components realize new algorithms. The case study focuses on univariate
network classification, for which the only information used is the
structure of class linkage in the network (i.e., only links and some
class labels). To our knowledge, no work previously has evaluated
systematically the power of class-linkage alone for classification in
machine learning benchmark data sets. The results demonstrate that very
simple network-classification models perform quite well—well
enough that they should be used regularly as baseline classifiers for
studies of learning with networked data. The simplest method (which
performs remarkably well) highlights the close correspondence between
several existing methods introduced for different purposes—that
is, Gaussian-field classifiers, Hopfield networks, and
relational-neighbor classifiers. The case study also shows that there
are two sets of techniques that are preferable in different situations,
namely when few versus many labels are known initially. We also
demonstrate that link selection plays an important role similar to
traditional feature selection |
| URI: | http://hdl.handle.net/2451/27814 |
| Appears in Collections: | CeDER Published Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|