|
|
Archive@NYU >
Stern School of Business >
CeDER Published Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/27812
|
| Title: | Learning and Inference in Massive Social Networks |
| Authors: | Hill, Shawndra Provost, Foster Volinsky, Chris |
| Keywords: | social networks collective inference viral marketing |
| Issue Date: | Aug-2007 |
| Citation: | The 5th International Workshop on Mining and Learning with Graphs,
August 2007 |
| Series/Report no.: | CeDER-PP-2007-05 |
| Abstract: | Researchers and practitioners increasingly are gaining access to data on
explicit social networks. For example, telecommunications and technology
firms record data on consumer networks (via phone calls, emails,
voice-over-IP, instant messaging), and social-network portal sites such
as MySpace, Friendster and Facebook record consumer-generated data on
social networks. Inference for fraud detection [5, 3, 8], marketing [9],
and other tasks can be improved with learned models that take social
networks into account and with collective inference [12], which allows
inferences about nodes in the network to affect each other. However,
these socialnetwork graphs can be huge, comprising millions to billions
of nodes and one or two orders of magnitude more links. This paper
studies the application of collective inference to improve prediction
over a massive graph. Faced initially with a social network comprising
hundreds of millions of nodes and a few billion edges, our goal is: to
produce an approximate consumer network that is orders of magnitude
smaller, but still facilitates improved performance via collective
inference. We introduce a sampling technique designed to reduce the size
of the network by many orders of magnitude, but to keep linkages that
facilitate improved prediction via collective inference. In short, the
sampling scheme operates as follows: (1) choose a set of nodes of
interest; (2) then, in analogy to snowball sampling [14], grow local
graphs around these nodes, adding their social networks, their
neighbors’ social networks, and so on; (3) next, prune these local
graphs of edges which are expected to contribute little to the
collective inference; (4) finally, connect the local graphs together to
form a graph with (hopefully) useful inference connectivity. We apply
this sampling method to assess whether collective inference can improve
learned targeted-marketing models for a social network of consumers of
telecommunication services. Prior work [9] has shown improvement to the
learning of targeting models by including social-neighborhood
information—in particular, information on existing customers in
the immediate social network of a potential target. However, the
improvement was restricted to the “network neighbors”, those
targets linked to a prior customer thought to be good candidates for the
new service. Collective inference techniques may extend the predictive
influence of existing customers beyond their immediate neighborhoods.
For the present work, our motivating conjecture has been that this
influence can improve prediction for consumers who are not strongly
connected to existing customers. Our results show that this is indeed
the case: collective inference on the approximate network enables
significantly improved predictive performance for non-network-neighbor
consumers, and for consumers who have few links to existing customers.
In the rest of this extended abstract we motivate our approach, describe
our sampling method, present results on applying our approach to a large
real-world target marketing campaign in the telecommunications industry,
and finally discuss our findings. |
| URI: | http://hdl.handle.net/2451/27812 |
| Appears in Collections: | CeDER Published Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|