<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>DSpace Collection: IOMS: Information Systems Working Papers</title>
    <link>http://hdl.handle.net/2451/14090</link>
    <description />
    <textInput>
      <title>The Collection's search engine</title>
      <description>Search the Channel</description>
      <name>search</name>
      <link>http://archive.nyu.edu/simple-search</link>
    </textInput>
    <item>
      <title>A Quality-Aware Optimizer for Information Extraction</title>
      <link>http://hdl.handle.net/2451/25886</link>
      <description>Title: A Quality-Aware Optimizer for Information Extraction
&lt;br/&gt;
&lt;br/&gt;Jain, Alpa; Ipeirotis, Panagiotis G.
&lt;br/&gt;
&lt;br/&gt;Abstract: Large amounts of structured information is buried in unstructured text. Information extraction systems can extract structured relations from the documents and enable sophisticated, SQL-like queries over unstructured text. Information extraction systems are not perfect and their output has imperfect precision and recall (i.e., contains spurious tuples and misses good tuples). Typically, an extraction system has a set of parameters that can be used as ``knobs'' and tune the system to be either precision- or recall-oriented. Furthermore, the choice of documents processed by the extraction system also affects the quality of the extracted relation. So far, estimating the output quality of an information extraction task was an ad-hoc procedure, based mainly on heuristics. In this paper, we show how to use receiver operating characteristic (ROC) curves to estimate the extraction quality in a statistically robust way and show how to use ROC analysis to select the extraction parameters in a principled manner. Furthermore, we present analytic models that reveal how different document retrieval strategies affect the quality of the extracted relation. Finally, we present our maximum likelihood approach for estimating---on the fly---the parameters required by our analytic models to predict the run time and the output quality of each execution plan. Our experimental evaluation demonstrates that our optimization approach predicts accurately the output quality and selects the fastest execution plan that satisfies the output quality restrictions.</description>
      <pubDate>Sat, 08 Mar 2008 01:24:05 GMT</pubDate>
    </item>
    <item>
      <title>Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers</title>
      <link>http://hdl.handle.net/2451/25882</link>
      <description>Title: Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers
&lt;br/&gt;
&lt;br/&gt;Sheng, Victor; Provost, Foster; Ipeirotis, Panagiotis G.
&lt;br/&gt;
&lt;br/&gt;Abstract: This paper addresses the repeated acquisition of labels for&#xD;
data items when the labeling is imperfect.  We&#xD;
examine the improvement (or lack thereof) in data quality via repeated&#xD;
labeling, and focus especially on the improvement of training labels&#xD;
for supervised induction.&#xD;
With the outsourcing of small tasks becoming&#xD;
easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it&#xD;
often is possible to obtain less-than-expert labeling at low cost.&#xD;
With low-cost labeling, preparing the unlabeled part of the data can&#xD;
become considerably more expensive than labeling.  We present&#xD;
repeated-labeling strategies of increasing complexity, and show&#xD;
several main results. (i) Repeated-labeling can improve label quality&#xD;
and model quality, but not always. (ii) When labels are noisy,&#xD;
repeated labeling can be preferable to single labeling even in the&#xD;
traditional setting where labels are not particularly cheap. (iii) As&#xD;
soon as the cost of processing the unlabeled data is not free, even&#xD;
the simple strategy of labeling everything multiple times can give&#xD;
considerable advantage. (iv) Repeatedly labeling a carefully chosen&#xD;
set of points is generally preferable, and we present a robust&#xD;
technique that combines different notions of uncertainty to select&#xD;
data points for which quality should be improved. The bottom line: the&#xD;
results show clearly that when labeling is not perfect, selective&#xD;
acquisition of multiple labels is a strategy that data miners should&#xD;
have in their repertoire; for certain label-quality/cost regimes, the&#xD;
benefit is substantial.</description>
      <pubDate>Thu, 06 Mar 2008 02:14:03 GMT</pubDate>
    </item>
    <item>
      <title>Does Chatter Matter?  The Impact of User-Generated Content on Music Sales</title>
      <link>http://hdl.handle.net/2451/23783</link>
      <description>Title: Does Chatter Matter?  The Impact of User-Generated Content on Music Sales
&lt;br/&gt;
&lt;br/&gt;Dhar, Vasant; Chang, Elaine
&lt;br/&gt;
&lt;br/&gt;Abstract: The Internet has enabled the era of user-generated content, potentially breaking the&#xD;
hegemony of traditional content generators as the primary sources of “legitimate” information.&#xD;
Prime examples of user-generated content are blogs and social networking sites, which allow easy&#xD;
publishing of and access to information. In this study, we examine the usefulness of such content,&#xD;
consisting of data from blogs and social networking sites in predicting sales in the music industry.&#xD;
We track the changes in online chatter for a sample of 108 albums for four weeks before and after&#xD;
their release dates. We use linear and nonlinear regression to identify the relative significance of&#xD;
online variables on their observation date in predicting future album unit sales two weeks ahead&#xD;
Our findings are as follows: (a) the volume of blog posts about an album is positively correlated&#xD;
with future sales, (b) greater increases in an artist’s Myspace friends week over week have a&#xD;
weaker correlation to higher future sales, (c) traditional factors are still relevant – albums released&#xD;
by major labels and albums with a number of reviews from mainstream sources like Rolling Stone&#xD;
also tended to have higher future sales. More generally, the study provides some preliminary&#xD;
answers for marketing managers interested in assessing the relative importance of the burgeoning&#xD;
number of “Web 2.0” information metrics that are becoming available on the Internet, and how&#xD;
looking at interactions among them could provide predictive value beyond viewing them in&#xD;
isolation. The study also provides a framework for thinking about when user-generated content&#xD;
influences decision making.</description>
      <pubDate>Wed, 24 Oct 2007 14:36:47 GMT</pubDate>
    </item>
    <item>
      <title>Deriving the Pricing Power of Product Features by Mining Consumer Reviews</title>
      <link>http://hdl.handle.net/2451/23604</link>
      <description>Title: Deriving the Pricing Power of Product Features by Mining Consumer Reviews
&lt;br/&gt;
&lt;br/&gt;Archak, Nikolay; Ghose, Anindya; Ipeirotis, Panagiotis G.
&lt;br/&gt;
&lt;br/&gt;Abstract: The increasing pervasiveness of the Internet has dramatically changed the way that consumers shop for&#xD;
goods. Consumer-generated product reviews have become a valuable source of information for customers,&#xD;
who read the reviews and decide whether to buy the product based on the information provided. In this&#xD;
paper, we use techniques that decompose the reviews into segments that evaluate the individual characteristics&#xD;
of a product (e.g., image quality and battery life for a digital camera). Then, as a major contribution of&#xD;
this paper, we adapt methods from the econometrics literature, specifically the hedonic regression concept, to&#xD;
estimate: (a) the weight that customers place on each individual product feature, (b) the implicit evaluation&#xD;
score that customers assign to each feature, and (c) how these evaluations affect the revenue for a given&#xD;
product. Towards this goal, we develop a novel hybrid technique combining text mining and econometrics&#xD;
that models consumer product reviews as elements in a tensor product of feature and evaluation spaces. We&#xD;
then impute the quantitative impact of consumer reviews on product demand as a linear functional from&#xD;
this tensor product space. We demonstrate how to use a low-dimension approximation of this functional to&#xD;
significantly reduce the number of model parameters, while still providing good experimental results. We&#xD;
evaluate our technique using a data set from Amazon.com consisting of sales data and the related consumer&#xD;
reviews posted over a 15-month period for 242 products. Our experimental evaluation shows that we can&#xD;
extract actionable business intelligence from the data and better understand the customer preferences and&#xD;
actions. We also show that the textual portion of the reviews can improve product sales prediction compared&#xD;
to a baseline technique that simply relies on numeric data.</description>
      <pubDate>Mon, 08 Oct 2007 14:26:21 GMT</pubDate>
    </item>
    <item>
      <title>Social Network Collaborative Filtering</title>
      <link>http://hdl.handle.net/2451/23407</link>
      <description>Title: Social Network Collaborative Filtering
&lt;br/&gt;
&lt;br/&gt;Zheng, Rong; Provost, Foster; Ghose, Anindya
&lt;br/&gt;
&lt;br/&gt;Abstract: This paper reports on a preliminary empirical study comparing methods for collaborative filtering (CF) using explicit data on consumers’ social networks. To our knowledge it is the first study to carefully evaluate the potential of explicit, publicly represented social networks for making product recommendations. Understanding social-network CF is important because traditional CF over a large consumer base is tremendously expensive computationally. An often-ignored aspect of CF is the selection of the set of users from which to make recommendations. Social theory tells us that social relationships are likely to connect similar people. If this similarity is in line with the recommendation task, they may provide a small, dense set of “recommenders” for CF. We examine a unique dataset from Amazon.com that contains a social network of consumer-selected friends. We examine two ways to incorporate social-network information into CF: using the social network to restrict the set of recommenders selected, and (further) using proximity in the social network to modify the traditional CF calculation. The results show that that CF with social-network members selected as recommenders can be remarkably superior as compared to collaborative filtering with the recommenders not socially connected. Once the social network is selected, social network proximity does not seem to improve recommendations.</description>
      <pubDate>Fri, 14 Sep 2007 16:03:31 GMT</pubDate>
    </item>
    <item>
      <title>Leveraging Aggregate Ratings for Better Recommendations</title>
      <link>http://hdl.handle.net/2451/23402</link>
      <description>Title: Leveraging Aggregate Ratings for Better Recommendations
&lt;br/&gt;
&lt;br/&gt;Umyarov, Akhmed; Tuzhilin, Alexander
&lt;br/&gt;
&lt;br/&gt;Abstract: The paper presents a method that uses aggregate ratings&#xD;
provided by various segments of users for various categories&#xD;
of items to derive better estimations of unknown individual&#xD;
ratings. This is achieved by converting the aggregate ratings&#xD;
into constraints on the parameters of a rating estimation&#xD;
model presented in the paper. The paper also demonstrates&#xD;
theoretically that these additional constraints reduce rating&#xD;
estimation errors resulting in better rating predictions.</description>
      <pubDate>Thu, 13 Sep 2007 20:59:07 GMT</pubDate>
    </item>
    <item>
      <title>Collective Inference for Consumer Networks</title>
      <link>http://hdl.handle.net/2451/15026</link>
      <description>Title: Collective Inference for Consumer Networks
&lt;br/&gt;
&lt;br/&gt;Hill, Shawndra; Provost, Foster; Volinsky, Chris</description>
      <pubDate>Thu, 17 May 2007 15:42:58 GMT</pubDate>
    </item>
  </channel>
</rss>

