Skip navigation
Full metadata record
DC FieldValueLanguage
dc.contributor.authorMcskassy, Sofus-
dc.contributor.authorProvost, Foster-
dc.identifier.citationJournal of Machine Learning Research 8(May):935--983, 200en
dc.description.abstractThis paper1 is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple network-classification models perform quite well—well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—that is, Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selectionen
dc.description.sponsorshipNYU, Stern School of Business, IOMS Department, Center for Digital Economy Researchen
dc.format.extent452337 bytes-
dc.publisherJournal of Machine Learning Researchen
dc.subjectrelational learningen
dc.subjectnetwork learningen
dc.subjectcollective inferenceen
dc.subjectcollective classificationen
dc.subjectnetworked dataen
dc.subjectprobabilistic relational modelsen
dc.subjectnetwork analysisen
dc.subjectnetwork dataen
dc.titleClassification in Networked Data: A Toolkit and a Univariate Case Studyen
Appears in Collections:CeDER Published Papers

Files in This Item:
File Description SizeFormat 
CPP-07-07.pdf441.74 kBAdobe PDFView/Open

Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.