<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>DSpace Collection: CeDER Published Papers</title>
    <link>http://hdl.handle.net/2451/27738</link>
    <description />
    <textInput>
      <title>The Collection's search engine</title>
      <description>Search the Channel</description>
      <name>search</name>
      <link>http://archive.nyu.edu/simple-search</link>
    </textInput>
    <item>
      <title>Which Came First, IT or Productivity? The Virtuous Cycle of Investment
and Use in Enterprise Systems</title>
      <link>http://hdl.handle.net/2451/27759</link>
      <description>Title: Which Came First, IT or Productivity? The Virtuous Cycle of Investmentand Use in Enterprise Systems&lt;br/&gt;&lt;br/&gt;Aral, Sinan; Brynjolfsson, Erik; Wu, D.J.&lt;br/&gt;&lt;br/&gt;Abstract: While it is now well established that IT intensive firms are moreproductive, a critical question remains: Does IT cause productivity orare productive firms simply willing to spend more on IT? We address thisquestion by examining the productivity and performance effects ofenterprise systems investments in a uniquely detailed and comprehensivedata set of 623 large, public U.S. firms. The data represent all U.S.customers of a large vendor during 1998&amp;ndash;2005 and include thevendor&amp;rsquo;s three main enterprise system suites: Enterprise ResourcePlanning (ERP), Supply Chain Management (SCM), and Customer RelationshipManagement (CRM). A particular benefit of our data is that theydistinguish the purchase of enterprise systems from their installationand use. Since enterprise systems often take years to implement, firmperformance at the time of purchase often differs markedly fromperformance after the systems &amp;ldquo;go live.&amp;rdquo; Specifically, inour ERP data, we find that purchase events are uncorrelated withperformance while go-live events are positively correlated. Thisindicates that the use of ERP systems actually causes performance gainsrather than strong performance driving the purchase of ERP. In contrast,for SCM and CRM, we find that performance is correlated with bothpurchase and golive events. Because SCM and CRM are installed after ERP,these results imply that firms that experience performance gains fromERP go on to purchase SCM and CRM. Our results are robust againstseveral alternative explanations and specifications and suggest that acausal relationship between ERP and performance triggers additional ITadoption in firms that derive value from their initial investment. Theseresults provide an explanation of simultaneity in IT value research thatfits with rational economic decision-making: Firms that successfullyimplement IT, react by investing in more IT. Our work suggests replacing&amp;ldquo;either-or&amp;rdquo; views of causality with a positive feedback loopconceptualization in which successful IT investments initiate a&amp;ldquo;virtuous cycle&amp;rdquo; of investment and gain. Our work alsoreveals other important estimation issues that can help researchersidentify relationships between IT and business value.</description>
      <pubDate>Mon, 10 Nov 2008 21:31:30 GMT</pubDate>
    </item>
    <item>
      <title>Tree Induction vs. Logistic Regression: A Learning-Curve Analysis</title>
      <link>http://hdl.handle.net/2451/27770</link>
      <description>Title: Tree Induction vs. Logistic Regression: A Learning-Curve Analysis&lt;br/&gt;&lt;br/&gt;Perlich, Claudia; Provost, Foster; Simonoff, Jeffrey&lt;br/&gt;&lt;br/&gt;Abstract: Tree induction and logistic regression are two standard, off-the-shelfmethods for building models for classification. We present a large-scaleexperimental comparison of logistic regression and tree induction,assessing classification accuracy and the quality of rankings based onclassmembership probabilities. We use a learning-curve analysis toexamine the relationship of these measures to the size of the trainingset. The results of the study show several things. (1) Contrary to someprior observations, logistic regression does not generally outperformtree induction. (2) More specifically, and not surprisingly, logisticregression is better for smaller training sets and tree induction forlarger data sets. Importantly, this often holds for training sets drawnfrom the same domain (that is, the learning curves cross), soconclusions about induction-algorithmsuperiority on a given domain mustbe based on an analysis of the learning curves. (3) Contrary toconventional wisdom, tree induction is effective at producingprobability-based rankings, although apparently comparatively less sofor a given training-set size than at making classifications. Finally,(4) the domains on which tree induction and logistic regression areultimately preferable can be characterized surprisingly well by a simplemeasure of the separability of signal from noise.</description>
      <pubDate>Sat, 31 May 2003 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Tree Induction for Probability-based Ranking</title>
      <link>http://hdl.handle.net/2451/27764</link>
      <description>Title: Tree Induction for Probability-based Ranking&lt;br/&gt;&lt;br/&gt;Provost, Foster; Domingos, Pedro</description>
      <pubDate>Mon, 17 Nov 2008 16:21:42 GMT</pubDate>
    </item>
    <item>
      <title>Towards Intelligent Assistance for a Data Mining Process:-</title>
      <link>http://hdl.handle.net/2451/27804</link>
      <description>Title: Towards Intelligent Assistance for a Data Mining Process:-&lt;br/&gt;&lt;br/&gt;Provost, Foster; Hill, Shawndra; Bernstein, Abraham&lt;br/&gt;&lt;br/&gt;Abstract: A data mining (DM) process involves multiple stages. A simple, buttypical, process might include preprocessing data, applying adata-mining algorithm, and postprocessing the mining results. There aremany possible choices for each stage, and only some combinations arevalid. Because of the large space and non-trivial interactions, bothnovices and data-mining specialists need assistance in composing andselecting DM processes. Extending notions developed for statisticalexpert systems we present a prototype Intelligent Discovery Assistant(IDA), which provides users with (i) systematic enumerations of valid DMprocesses, in order that important, potentially fruitful options are notoverlooked, and (ii) effective rankings of these valid processes bydifferent criteria, to facilitate the choice of DM processes to execute.We use the prototype to show that an IDA can indeed provide usefulenumerations and effective rankings in the context of simpleclassification processes. We discuss how an IDA could be an importanttool for knowledge sharing among a team of data miners. Finally, weillustrate the claims with a comprehensive demonstration ofcost-sensitive classification using a more involved process and datafrom the 1998 KDDCUP competition.</description>
      <pubDate>Tue, 29 Mar 2005 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Towards a Query Optimizer for Text-Centric Tasks</title>
      <link>http://hdl.handle.net/2451/27821</link>
      <description>Title: Towards a Query Optimizer for Text-Centric Tasks&lt;br/&gt;&lt;br/&gt;Ipeirotis, Panagiotis; Agichtein, Eugene; Jain, Pranay; Gravano, Luis&lt;br/&gt;&lt;br/&gt;Abstract: Text is ubiquitous and, not surprisingly, many important applicationsrely on textual data for a variety of tasks. As a notable example,information extraction applications derive structured relations fromunstructured text; as another example, focused crawlers explore the Webto locate pages about specific topics. Execution plans for text-centrictasks follow two general paradigms for processing a text database:either we can scan, or &amp;ldquo;crawl,&amp;rdquo; the text database or,alternatively, we can exploit search engine indexes and retrieve thedocuments of interest via carefully crafted queries constructed intask-specific ways. The choice between crawl- and query-based executionplans can have a substantial impact on both execution time and output&amp;ldquo;completeness&amp;rdquo; (e.g., in terms of recall). Nevertheless,this choice is typically ad hoc and based on heuristics or plainintuition. In this article, we present fundamental building blocks tomake the choice of execution plans for text-centric tasks in aninformed, cost-based way. Towards this goal, we show how to analyzequery- and crawl-based plans in terms of both execution time and outputcompleteness. We adapt results from random-graph theory and statisticsto develop a rigorous cost model for the execution plans. Our cost modelreflects the fact that the performance of the plans depends onfundamental task-specific properties of the underlying text databases.We identify these properties and present efficient techniques forestimating the associated parameters of the cost model.We also presenttwo optimization approaches for text-centric tasks that rely on thecost-model parameters and select efficient execution plans. Overall, ouroptimization approaches help build efficient execution plans for a task,resulting in significant efficiency and output completeness benefits. Wecomplement our results with a large-scale experimental evaluation forthree important text-centric tasks and over multiple real-life data sets.</description>
      <pubDate>Mon, 29 Oct 2007 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>The Productivity Payoff of Computers</title>
      <link>http://hdl.handle.net/2451/27840</link>
      <description>Title: The Productivity Payoff of Computers&lt;br/&gt;&lt;br/&gt;Bakos, Yannis&lt;br/&gt;&lt;br/&gt;Abstract: This is a review of 'The Computer Revolution: An Economic Perspective'by Daniel E. Sichel</description>
      <pubDate>Thu, 02 Jul 1998 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>The Myth of the Double-Blind Review? Author Identification Using Only Citations</title>
      <link>http://hdl.handle.net/2451/27766</link>
      <description>Title: The Myth of the Double-Blind Review? Author Identification Using Only Citations&lt;br/&gt;&lt;br/&gt;Provost, Foster; Hill, Shawndra&lt;br/&gt;&lt;br/&gt;Abstract: Prior studies have questioned the degree of anonymity of thedouble-blind review process for scholarly research articles. Forexample, one study based on a survey of reviewers concluded that authorsoften could be identified by reviewers using a combination of theauthor's reference list and the referee's personal background knowledge.For the KDD Cup 2003 competition's &amp;quot;Open Task&amp;quot; we examined howwell various automatic matching techniques could identify authors withinthe competition's very large archive of research papers. This paperdescribes the issues surrounding author identification, how these issuesmotivated our study, and the results we obtained. The best method, basedon discriminative self-citations, identified authors correctly 40-45% ofthe time. One main motivation for double-blind review is to eliminatebias in favor of well-known authors. However, identification accuracyfor authors with substantial publication history is even better (60%accuracy for the top-10% most prolific authors, 85% for authors with 100or more prior papers).</description>
      <pubDate>Sat, 31 May 2003 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>The Impact of Internet Referral Services on a Supply Chain</title>
      <link>http://hdl.handle.net/2451/27744</link>
      <description>Title: The Impact of Internet Referral Services on a Supply Chain&lt;br/&gt;&lt;br/&gt;Ghose, Anindya; Mukhopadhyay, Tridas; Rajan, Uday&lt;br/&gt;&lt;br/&gt;Abstract: In many industries, Internet referral services, hosted either byindependent third-party infomediaries or by manufacturers, serve asdigitally enabled lead generators in electronic markets, directingconsumer traffic to downstream retailers in a distribution network. Thisreshapes the extended enterprise from the traditional network ofupstream manufacturers and downstream retailers to include midstreamthird-party and manufacturerowned referral services in the supply chain.We model competition between retailers in a supply chain with suchdigitally enabled institutions and consider their impact on the optimalcontracts among the manufacturer, referral intermediary, and theretailers. Offline, retailers face a higher customer discovery cost. Inreturn, they can engage in price discrimination based on consumervaluations. Online, they save on the discovery costs but lose theability to identify consumer valuations. This critical trade-off drivesfirms&amp;rsquo; equilibrium strategies. We derive the optimal contracts fordifferent entities in the supply chain and highlight how these contractschange with the entry of independent and manufacturer-owned referralservices. The establishment of a referral service is a strategicdecision by the manufacturer. It leads to diversion of supply chainprofit from a third-party infomediary to the manufacturer. Further, itenables the manufacturer to respond to an infomediary, by giving itselfgreater flexibility in setting the unit wholesale fee to theprofit-maximizing level. Both third-party and manufacturer-sponsoredreferral services play a critical role in enabling retailers todiscriminate across consumers&amp;rsquo; different valuations. Retailers useonline referral services to screen out low-valuation consumers and sellonly to high-valuation consumers in the online channel. Our model thusendogenously derives a correlation between consumer valuation and onlinepurchase behavior. Finally, we show that under some circumstances, it istoo costly for the manufacturer to eliminate the referral infomediary</description>
      <pubDate>Thu, 06 Nov 2008 16:12:39 GMT</pubDate>
    </item>
    <item>
      <title>The Gift of Gab: Evidence TelE-Commerce Firms Can Profit from Viral Marketing</title>
      <link>http://hdl.handle.net/2451/27803</link>
      <description>Title: The Gift of Gab: Evidence TelE-Commerce Firms Can Profit from Viral Marketing&lt;br/&gt;&lt;br/&gt;Hill, Shawndra; Provost, Foster; Volinsky, Chris&lt;br/&gt;&lt;br/&gt;Abstract: Viral or buzz marketing takes advantage of communication linkages topropagate positive influence regarding a product or service.TelE-commerce is an ideal domain within which to study viral marketing,because communication linkages can be observed. In this paper, we followa new telE-commerce service. In particular, we observe how thecommunication networks of existing customers influence the rate ofproduct diffusion. The main contribution of this paper is evidence thatconsumers are more likely to purchase a service if they have previouslyspoken to a person who has the service. In addition, we offer thefollowing three contributions: 1) the clarification that this need notbe evidence of viral influence, we suggest different explanations; 2) wealso describe the relation of these explanations to theories ofpurchasing behavior; and 3) we present some evidence to discern fromamong the explanations.</description>
      <pubDate>Thu, 28 Apr 2005 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>The Emerging Role of Electronic Marketplaces on the Internet</title>
      <link>http://hdl.handle.net/2451/27839</link>
      <description>Title: The Emerging Role of Electronic Marketplaces on the Internet&lt;br/&gt;&lt;br/&gt;Bakos, Yannis</description>
      <pubDate>Wed, 29 Jul 1998 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>The Economic Incentives for Sharing Security Information</title>
      <link>http://hdl.handle.net/2451/27739</link>
      <description>Title: The Economic Incentives for Sharing Security Information&lt;br/&gt;&lt;br/&gt;Ghose, Anindya; Gal-Or, Esther&lt;br/&gt;&lt;br/&gt;Abstract: Given that information technology (IT) security has emerged as animportant issue in the last few years, the subject of securityinformation sharing among firms, as a tool to minimize securitybreaches, has gained the interest of practitioners and academics. Topromote the disclosure and sharing of cyber security information amongfirms, the U.S. federal government has encouraged the establishment ofmany industry-based Information Sharing and Analysis Centers (ISACs)under Presidential Decision Directive (PDD) 63. Sharing securityvulnerabilities and technological solutions related to methods forpreventing, detecting, and correcting security breaches is thefundamental goal of the ISACs. However, there are a number ofinteresting economic issues that will affect the achievement of thisgoal. Using game theory, we develop an analytical framework toinvestigate the competitive implications of sharing security informationand investments in security technologies. We find that securitytechnology investments and security information sharing act as&amp;ldquo;strategic complements&amp;rdquo; in equilibrium. Our results suggestthat information sharing is more valuable when product substitutabilityis higher, implying that such sharing alliances yield greater benefitsin more competitive industries. We also highlight that the benefits fromsuch information-sharing alliances increase with the size of the firm.We compare the levels of information sharing and technology investmentsobtained when firms behave independently (Bertrand-Nash) to thoseselected by an ISAC, which maximizes social welfare or joint industryprofits. Our results help us predict the consequences of establishingorganizations such as ISACs, Computer Emergency Response Team (CERT), orInfraGard by the federal government.</description>
      <pubDate>Mon, 29 May 2006 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>The Economic Impact of User-Generated and Firm-Published Online Content:
Directions for Advancing the Frontiers in Electronic Commerce Research</title>
      <link>http://hdl.handle.net/2451/27749</link>
      <description>Title: The Economic Impact of User-Generated and Firm-Published Online Content:Directions for Advancing the Frontiers in Electronic Commerce Research&lt;br/&gt;&lt;br/&gt;Ghose, Anindya</description>
      <pubDate>Thu, 06 Nov 2008 18:22:42 GMT</pubDate>
    </item>
    <item>
      <title>The Dimensions of Reputation in Electronic Markets</title>
      <link>http://hdl.handle.net/2451/27740</link>
      <description>Title: The Dimensions of Reputation in Electronic Markets&lt;br/&gt;&lt;br/&gt;Ghose, Anindya; Panagiotis, Ipeirotis; Sundararajan, Arun&lt;br/&gt;&lt;br/&gt;Abstract: We analyze how di erent dimensions of a seller's reputation a ectpricing power in electronic markets. We do so by using text miningtechniques to identify and structure dimensions of importance fromfeedback posted on reputation systems, by aggregating and scoring thesedimensions based on the sentiment they contain, and using them toestimate a series of econometric models associating reputation withprice premiums. We  nd that di erent dimensions do indeed a ect pricingpower di erentially, and that a negative reputation hurts more than apositive one helps on some dimensions but not on others. We provide therst evidence that sellers of identical products in electronic markets dierentiate themselves based on a distinguishing dimension of strength,and that buyers vary in the relative importance they place on di erentful lment characteristics. We highlight the importance of textualreputation feedback further by demonstrating it substantially improvesthe performance of a classi er we have trained to predict future sales.This paper is the  rst study that integrates econometric, text miningand predictive modeling techniques toward a more complete analysis ofthe information captured by reputation systems, and it presents newevidence of the importance of their e ective and judicious design.</description>
      <pubDate>Wed, 05 Nov 2008 21:03:13 GMT</pubDate>
    </item>
    <item>
      <title>Telecommunications Network Diagnosis</title>
      <link>http://hdl.handle.net/2451/27754</link>
      <description>Title: Telecommunications Network Diagnosis&lt;br/&gt;&lt;br/&gt;Danyluk, Andrea; Provost, Foster; Carr, Brian&lt;br/&gt;&lt;br/&gt;Abstract: The Scrubber 3 system monitors problems in the local loop of thetelephone network, making automated decisions on tens of millions ofcases a year, many of which lead to automated actions. Scrubber savesBell Atlantic millions of dollars annually, by reducing the number ofinappropriate technician dispatches. Scrubber's core knowledge base, theTrouble Isolation Module (TIM), is a probability estimation treeconstructed via several data mining processes. TIM currently is deployedin the Delphi system, which serves knowledge to multiple applications.As compared to previous approaches, TIM is more general, more robust,and easier to update when the network or user requirements change. Undercertain circumstances it also provides better classifications. In fact,TIM's knowledge is general enough that it now serves a second deployedapplication. One of the most interesting aspects of the construction ofTIM is that data mining was used not only in the traditional sense,namely, building a model from a warehouse of actual historical cases.Data mining also was used to produce an understandable model of theknowledge contained in an earlier, successful diagnostic system.</description>
      <pubDate>Fri, 07 Nov 2008 21:25:07 GMT</pubDate>
    </item>
    <item>
      <title>Suspicion scoring based on guilt-by-association, collective inference,
and focused</title>
      <link>http://hdl.handle.net/2451/27806</link>
      <description>Title: Suspicion scoring based on guilt-by-association, collective inference,and focused&lt;br/&gt;&lt;br/&gt;Mcskassy, Sofus; Provost, Foster&lt;br/&gt;&lt;br/&gt;Abstract: We describe a guilt-by-association system that can be used to rankentities by their suspiciousness. We demonstrate the algorithm on asuite of data sets generated by a terroristworld simulator developedunder a DoD program. The data sets consist of thousands of people andsome known links between them. We show that the system ranks trulymalicious individuals highly, even if only relatively few are known tobe malicious ex ante. When used as a tool for identifying promisingdata-gathering opportunities, the system focuses on gathering moreinformation about the most suspicious people and thereby increases thedensity of linkage in appropriate parts of the network. We assessperformance under conditions of noisy prior knowledge (score qualityvaries by data set under moderate noise), and whether augmenting thenetwork with prior scores based on profiling information improves thescoring (it doesn&amp;rsquo;t). Although the level of performance reportedhere would not support direct action on all data sets, it does recommendthe consideration of network-scoring techniques as a new source ofevidence in decision making. For example, the system can operate onnetworks far larger and more complex than could be processed by a human analyst.</description>
      <pubDate>Fri, 29 Oct 2004 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>ROC Confidence Bands: An Empirical Evaluation</title>
      <link>http://hdl.handle.net/2451/27805</link>
      <description>Title: ROC Confidence Bands: An Empirical Evaluation&lt;br/&gt;&lt;br/&gt;Macskassy, Sofus; Provost, Foster; Rosset, Saharon&lt;br/&gt;&lt;br/&gt;Abstract: This paper is about constructing confidence bands around ROC curves. Wefirst introduce to the machine learning community three band-generatingmethods from the medical field, and evaluate how well they perform. Suchconfidence bands represent the region where the &amp;ldquo;true&amp;rdquo; ROCcurve is expected to reside, with the designated confidence level. Toassess the containment of the bands we begin with a synthetic worldwhere we know the true ROC curve&amp;mdash;specifically, where theclass-conditional model scores are normally distributed. The only methodthat attains reasonable containment out-of-the-box producesnon-parametric, &amp;ldquo;fixed-width&amp;rdquo; bands (FWBs). Next we move toa context more appropriate for machine learning evaluations: bands thatwith a certain confidence level will bound the performance of the modelon future data. We introduce a correction to account for the largeruncertainty, and the widened FWBs continue to have reasonablecontainment. Finally, we assess the bands on 10 relatively largebenchmark data sets. We conclude by recommending these FWBs, noting thatbeing non-parametric they are especially attractive for machine learningstudies, where the score distributions (1) clearly are not normal, and(2) even for the same data set vary substantially from learning methodto learning method.</description>
      <pubDate>Fri, 29 Oct 2004 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Robust Classification for Imprecise Environments</title>
      <link>http://hdl.handle.net/2451/27765</link>
      <description>Title: Robust Classification for Imprecise Environments&lt;br/&gt;&lt;br/&gt;Provost, Foster; Fawcett, Tom&lt;br/&gt;&lt;br/&gt;Abstract: In real-world environments it usually is difficult to specify targetoperating conditions precisely, for example, target misclassificationcosts. This uncertainty makes building robust classification systemsproblematic. We show that it is possible to build a hybrid classifierthat will perform at least as well as the best available classifier forany target conditions. In some cases, the performance of the hybridactually can surpass that of the best known classifier. This robustperformance extends across a wide variety of comparison frameworks,including the optimization of metrics such as accuracy, expected cost,lift, precision, recall, and workforce utilization. The hybrid also isefficient to build, to store, and to update. The hybrid is based on amethod for the comparison of classifier performance that is robust toimprecise class distributions and misclassification costs. The ROCconvex hull method combines techniques from ROC analysis, decisionanalysis and computational geometry, and adapts them to the particularsof analyzing learned classifiers. The method is efficient andincremental, minimizes the management of classifier performance data,and allows for clear visual comparisons and sensitivity analysis.Finally, we point to empirical evidence that a robust hybrid classifierindeed is needed for many real-world problems.</description>
      <pubDate>Wed, 28 Feb 2001 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Reducing Buyer Search Costs: Implications for Electronic Marketplaces</title>
      <link>http://hdl.handle.net/2451/27833</link>
      <description>Title: Reducing Buyer Search Costs: Implications for Electronic Marketplaces&lt;br/&gt;&lt;br/&gt;Bakos, Yannis&lt;br/&gt;&lt;br/&gt;Abstract: Information systems can serve as intermediaries between the buyers andthe sellers in a market, creating an &amp;quot;electronic marketplace&amp;quot;that lowers the buyers' cost to acquire information about seller pricesand product offerings. As a result, electronic marketplaces reduce theinefficiencies caused by buyer search costs, in the process reducing theability of sellers to extract monopolistic profits while increasing theability of markets to optimally allocate resources. This article modelsthe role of buyer search costs in markets with differentiated productofferings. The impact of reducing these search costs in analyzed in thecontext of an electronic marketplace, and the allocational efficienciessuch a reduction can bring to a differentiated market are formalized.The resulting implications for the incentives of buyers, sellers, andindependent intermediaries to invest in electronic marketplaces areexplored. Finally, the possibility to separate price information fromproduct attribute information is introduced, and the implications ofdesigning markets promoting competition along each of these dimensionare discussed.</description>
      <pubDate>Fri, 28 Nov 1997 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Recent Applications of Economic Theory in Information Technology Research</title>
      <link>http://hdl.handle.net/2451/27830</link>
      <description>Title: Recent Applications of Economic Theory in Information Technology Research&lt;br/&gt;&lt;br/&gt;Yannis, Bakos; Kemerer, C.F.&lt;br/&gt;&lt;br/&gt;Abstract: Academicians and practitioners are becoming increasingly interested inthe economics of Information Technology (IT). In part, this intereststems from the increased role that IT now plays in the strategicthinking of most large organizations, and from the significant dollarcosts expended by these organizations on IT. Naturally enough,researchers are turning to economics as a reference discipline in theirattempt to answer questions concerning both the value added by IT andthe true cost of providing IT resources.  This increased interest in theeconomics of IT is manifested in the application of a number of aspectsof economic theory in recent information systems research, leading toresults that have appeared in a wide variety of publication outlets.This article reviews this work and provides a systematic categorizationas a first step in establishing a common research tradition, and toserve as an introduction for researchers beginning work in this area.Six areas of economic theory are represented: information economics,production economics, economic models of organizational performance,industrial organization, institutional economics (agency theory andtransaction cost theory), and macroeconomic studies of IT impact. Foreach of these areas, recent work is reviewed and suggestions for futureresearch are provided.</description>
      <pubDate>Sat, 28 Nov 1992 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>QProber: A System for Automatic Classification of Hidden-Web Databases</title>
      <link>http://hdl.handle.net/2451/27820</link>
      <description>Title: QProber: A System for Automatic Classification of Hidden-Web Databases&lt;br/&gt;&lt;br/&gt;Ipeirotis, Panagiotis; Gravano, Luis&lt;br/&gt;&lt;br/&gt;Abstract: The contents of many valuable Web-accessible databases are onlyavailable through search interfaces and are hence invisible totraditional Web &amp;ldquo;crawlers.&amp;rdquo; Recently, commercial Web siteshave started to manually organize Web-accessible databases intoYahoo!-like hierarchical classification schemes. Here we introduceQProber, a modular system that automates this classification process byusing a small number of query probes, generated by document classifiers.QProber can use a variety of types of classifiers to generate theprobes. To classify a database, QProber does not retrieve or inspect anydocuments or pages from the database, but rather just exploits thenumber of matches that each query probe generates at the database inquestion. We have conducted an extensive experimental evaluation ofQProber over collections of real documents, experimenting with differenttypes of document classifiers and retrieval models. We have also testedour system with over one hundred Web-accessible databases. Ourexperiments show that our system has low overhead and achieves highclassification accuracy across a variety of databases.</description>
      <pubDate>Tue, 31 Dec 2002 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Productivity Effects of Information Diffusion in Networks</title>
      <link>http://hdl.handle.net/2451/27760</link>
      <description>Title: Productivity Effects of Information Diffusion in Networks&lt;br/&gt;&lt;br/&gt;Aral, Sinan; Brynjolfsson, Erik; Van Alstyne, Marshall&lt;br/&gt;&lt;br/&gt;Abstract: We examine what drives the diffusion of different types of informationthrough email networks and the effects of these diffusion patterns onthe productivity and performance of information workers. In particular,we ask: What predicts the likelihood of an individual becoming aware ofa strategic piece of information, or becoming aware of it sooner? Dodifferent types of information exhibit different diffusion patterns, anddo different characteristics of social structure, relationships andindividuals in turn affect access to different kinds of information?Does better access to information predict an individual&amp;rsquo;s abilityto complete projects or generate revenue? We characterize the socialnetwork of a medium sized executive recruiting firm using accountingdata on project co-work relationships and ten months of email traffic.We identify two distinct types of information diffusing over thisnetwork &amp;ndash; &amp;lsquo;event news&amp;rsquo; and &amp;lsquo;discussiontopics&amp;rsquo; &amp;ndash; by their usage characteristics, and observeseveral thousand diffusion processes of each type of information. Wefind the diffusion of news, characterized by a spike in communicationand rapid, pervasive diffusion through the organization, is influencedby demographic and network factors but not by functional relationships(e.g. prior co-work, authority) or the strength of ties. In contrast,diffusion of discussion topics, which exhibit shallow diffusioncharacterized by &amp;lsquo;back-and-forth&amp;rsquo; conversation, is heavilyinfluenced by functional relationships and the strength of ties, as wellas demographic and network factors. Discussion topics are more likely todiffuse vertically up and down the organizational hierarchy, acrossrelationships with a prior working history, and across stronger ties,while news is more likely to diffuse laterally as well as vertically,and without regard to the strength or function of relationships. We alsofind access to information strongly predicts project completion andrevenue generation. The effects are economically significant, with eachadditional &amp;ldquo;word seen&amp;rdquo; correlated with about $70 ofadditional revenue generated. Our findings provide some of the firstevidence of the economic significance of information diffusion in email networks.</description>
      <pubDate>Mon, 10 Nov 2008 21:33:50 GMT</pubDate>
    </item>
    <item>
      <title>Predicting citation rates for physics papers: Constructing features for
an ordered probit model</title>
      <link>http://hdl.handle.net/2451/27767</link>
      <description>Title: Predicting citation rates for physics papers: Constructing features foran ordered probit model&lt;br/&gt;&lt;br/&gt;Perlich, Claudia; Provost, Foster; Macskassy, Sofus&lt;br/&gt;&lt;br/&gt;Abstract: Gehrke et al. introduce the citation prediction task in their paper&amp;quot;Overview of the KDD Cup 2003&amp;quot; (in this issue). The objectivewas to predict the &lt;i&gt;change&lt;/i&gt; in the number of citations a paper will receive-not the absolute numberof citations. There are obvious factors affecting the number ofcitations including the quality and the topic of the paper, and thereputation of the authors. However it is not clear which factors mightinfluence the change in citations between quarters, rendering theconstruction of predictive features a challenging task. A high qualityand timely paper will be cited more often than a lower quality paper,but that does not suggest the change in citation counts. The selectionof training data was critical, as the evaluation would only be on papersthat received more than 5 citations in the quarter following thesubmission of results. After considering several modeling approaches, weused a modified version of an ordered probit model. We describe each ofthese steps in turn.</description>
      <pubDate>Sat, 31 May 2003 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Pointwise ROC Confidence Bounds: An Empirical Evaluation</title>
      <link>http://hdl.handle.net/2451/27808</link>
      <description>Title: Pointwise ROC Confidence Bounds: An Empirical Evaluation&lt;br/&gt;&lt;br/&gt;Macskassy, Sofus; Provost, Foster; Rosset, Saharon&lt;br/&gt;&lt;br/&gt;Abstract: This paper is about constructing and evaluating pointwise confidencebounds on an ROC curve. We describe four confidencebound methods, twofrom the medical field and two used previously in machine learningresearch. We evaluate whether the bounds indeed contain the relevantoperating point on the &amp;ldquo;true&amp;rdquo; ROC curve with a confidence of1&amp;minus; . We then evaluate pointwise confidence bounds on the regionwhere the future performance of a model is expected to lie. Forevaluation we use a synthetic world representing &amp;ldquo;binormal&amp;rdquo;distributions&amp;ndash;the classification scores for positive and negativeinstances are drawn from (separate) normal distributions. For the&amp;ldquo;true-curve&amp;rdquo; bounds, all methods are sensitive to how wellthe distributions are separated, which corresponds directly to the areaunder the ROC curve. One method produces bounds that are universally tooloose, another universally too tight, and the remaining two are close tothe desired containment although containment breaks down at the extremesof the ROC curve. As would be expected, all methods fail when used tocontain &amp;ldquo;future&amp;rdquo; ROC curves. Widening the bounds to accountfor the increased uncertainty yields identical qualitative results tothe &amp;ldquo;true-curve&amp;rdquo; evaluation. We conclude by recommending asimple, very efficient method (vertical averaging) for large samplesizes and a more computationally expensive method (kernel estimation)for small sample sizes.</description>
      <pubDate>Fri, 29 Oct 2004 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Personalized Pricing and Quality Differentiation</title>
      <link>http://hdl.handle.net/2451/27742</link>
      <description>Title: Personalized Pricing and Quality Differentiation&lt;br/&gt;&lt;br/&gt;Choudhary, Vidyanand; Ghose, Anindya; Mukhopadhyay, Tridas; Rajan, Uday&lt;br/&gt;&lt;br/&gt;Abstract: We develop an analytical framework to investigate the competitiveimplications of personalized pricing (PP), whereby firms chargedifferent prices to different consumers based on their willingness topay. We embed PP in a model of vertical product differentiation and showhow it affects firms&amp;rsquo; choices over quality. We show thatfirms&amp;rsquo; optimal pricing strategies with PP may be nonmonotonic inconsumer valuations. When the PP firm has high quality, both firms raisetheir qualities relative to the uniform pricing case. Conversely, whenthe PP firm has low quality, both firms lower their qualities. Althoughmany firms are trying to implement such pricing policies, we find that ahigher-quality firm can actually be worse off with PP. While it isoptimal for the firm adopting PP to increase product differentiation,the non-PP firm seeks to reduce differentiation by moving in closer inthe quality space. While PP results in a wider market coverage, it alsoleads to aggravated price competition between firms. Because thisentails a change in equilibrium qualities, the nature of the costfunction determines whether firms gain or lose by implementing such PPpolicies. Despite the threat of first-degree price discrimination, wefind that PP with competing firms can lead to an overall increase inconsumer welfare.</description>
      <pubDate>Thu, 06 Nov 2008 15:37:25 GMT</pubDate>
    </item>
    <item>
      <title>Personalized Pricing and Quality Design</title>
      <link>http://hdl.handle.net/2451/27741</link>
      <description>Title: Personalized Pricing and Quality Design&lt;br/&gt;&lt;br/&gt;Ghose, Anindya; Huang, Ke-Wei&lt;br/&gt;&lt;br/&gt;Abstract: We develop an analytical framework to investigate the competitiveimplications of personalized pricing and quality allocation (PPQ),whereby  rms charge di&amp;curren;erent prices and o&amp;curren;erdi&amp;curren;erent qualities to di&amp;curren;erent consumers, based on theirwillingness to pay. We embed PPQ in a model of spatialdi&amp;curren;erentiation, and show how information about consumerpreferences a&amp;curren;ects multi-product  rms choices over pricingschedules and product line o&amp;curren;erings. We show that consumersurplus with PPQ will be non-monotonic in consumer valuations. Our modelsheds light on the di&amp;curren;erent product quality scheduleso&amp;curren;ered by  rms, given that one or both  rms implement PPQ.Contrary to prior literature on one-to-one marketing, we show that evensymmetric  rms can avoid the well-known Prisoner s Dilemma problem dueto the quality enhancement e&amp;curren;ect at the individual consumerlevel. The rent extraction e&amp;curren;ect due to quality enhancementdominates the adverse e&amp;curren;ect of price competition. Moreover, thisresult is stronger when  rms have a larger proportion of loyalconsumers. When both  rms have PPQ, consumer surplus is non-monotonic invaluations such that some low valuation consumers get higher surplusthan high valuation consumers. We extend our analysis to asymmetric  rmsand show that when one  rm adopts PPQ, it always increases its qualitylevel while the other  rm keeps its quality schedule unchanged comparedto the case when neither  rm has PPQ. We demonstrate that a  rm with anex-ante, smaller loyal segment can be better o&amp;curren; with PPQ.</description>
      <pubDate>Thu, 06 Nov 2008 15:33:09 GMT</pubDate>
    </item>
    <item>
      <title>Ownership and Investment in Electronic Networks</title>
      <link>http://hdl.handle.net/2451/27836</link>
      <description>Title: Ownership and Investment in Electronic Networks&lt;br/&gt;&lt;br/&gt;Bakos, Yannis; Nault, Barrie&lt;br/&gt;&lt;br/&gt;Abstract: We employ the theory of incomplete contracts to examine the relationshipbetween ownership and investment in electronic networks such as theInternet and interorganizational information systems. Electronicnetworks represent an institutional structure that has resulted from theintroduction of information technology in industrial and consumermarkets. Ownership of electronic networks is important because itaffects the level of network-specific investments, which in turndetermine the profitability and in some cases the viability of thesenetworks. In our analysis we define an electronic network as a set ofparticipants and a portfolio of assets. The salient concept in thisperspective is the degree to which network participants areindispensable in making network assets productive. We derive three mainresults: First, if one or more assets are essential to all networkparticipants, then all the assets should be owned together. Second,participants that are indispensable to an asset essential to allparticipants should own all network assets. Third and most important, inthe absence of an indispensable participant, and as long as thecooperation of at least two participants is necessary to create value,sole ownership is never the best form of ownership for an electronicnetwork. This latter result implies that as the leading networkparticipants become more dispensable, we should see an evolution towardsforms of joint ownership.</description>
      <pubDate>Wed, 07 Jan 2009 21:02:35 GMT</pubDate>
    </item>
    <item>
      <title>Organizational PArtnerships and the Virtual Corporation</title>
      <link>http://hdl.handle.net/2451/27837</link>
      <description>Title: Organizational PArtnerships and the Virtual Corporation&lt;br/&gt;&lt;br/&gt;Bakos, Yannis; Brynjolfsson, Erik&lt;br/&gt;&lt;br/&gt;Abstract: Organizations are transforming their relationships with their businesspartners. For example, instead of playing off dozens or even hundreds ofcompeting suppliers against each other, many firms are finding it moreprofitable to work closely with only a small number of&amp;quot;partners&amp;quot;. While these firms generally increase their amountof outsourcing, by focusing on a small number of partners they createvalue networks that are often referred to as&amp;quot;value-added-partnerships&amp;quot;, &amp;quot;virtual organizations&amp;quot;or &amp;quot;modular corporations&amp;quot;. In this work we explore some causesand consequences of this transformation. We apply the economic theory ofincomplete contracts to study the optimal number of business partners,with particular attention to the role of information technology.Surprisingly, we find that organizations will often maximize profits bylimiting their options and reducing their own bargaining power. This mayseem paradoxical in an age of cheap communications costs and aggressivecompetition. However, unlike earlier studies that focused oncoordination costs, we focus on the critical importance of providingincentives for business partners. Our results spring from the need tomake it worthwhile for business partners to invest in&amp;quot;non-contractibles&amp;quot; like innovation, responsiveness andinformation sharing. Such incentives will be stronger when the number ofcompeting partners is small. The findings of the theoretical modelsappear to be consistent with observations from empirical research whichhighlight the key role of information technology in enabling this transformation.</description>
      <pubDate>Tue, 29 Oct 1996 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Nonlinear Pricing of Information Goods</title>
      <link>http://hdl.handle.net/2451/27798</link>
      <description>Title: Nonlinear Pricing of Information Goods&lt;br/&gt;&lt;br/&gt;Sundararajan, Arun&lt;br/&gt;&lt;br/&gt;Abstract: This paper analyzes optimal pricing for information goods underincomplete information, when both unlimited-usage (fixed-fee) pricingand usage-based pricing are feasible and administering usage-basedpricing may involve transaction costs. It is shown that offeringfixed-fee pricing in addition to a nonlinear usagebased pricing schemeis always profit improving in the presence of nonzero transaction costs,and there may be markets in which a pure fixed-fee is optimal. Thisimplies that the optimal pricing strategy for information goods isalmost never fully revealing. Moreover, it is proved that the optimalusage-based pricing schedule is independent of the value of the fixedfee, a result that simplifies the simultaneous design of pricingschedules considerably and provides a simple procedure for determiningthe optimal combination of fixed-fee and nonlinear usage-based pricing.The introduction of fixed-fee pricing is shown to increase both consumersurplus and total surplus. The differential effects of setup costs,fixed transaction costs, and variable transaction costs on pricingpolicy are described. These results suggest a number of managerialguidelines for designing pricing schedules. For instance, in nascentinformation markets, firms may profit from low fixed-fee penetrationpricing, but as these markets mature, the optimal pricing mix shouldexpand to include a wider range of usage-based pricing options. Minimumfees, quantity discounts, and adoption levels across the differentpricing schemes are characterized, strategic pricing responses tochanges in market characteristics are described, and the implications ofthe paper&amp;rsquo;s results for bundling and vertical differentiation ofinformation goods are discussed.</description>
      <pubDate>Sun, 28 Nov 2004 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Nonlinear pricing and type-dependent network effects</title>
      <link>http://hdl.handle.net/2451/27799</link>
      <description>Title: Nonlinear pricing and type-dependent network effects&lt;br/&gt;&lt;br/&gt;Sundararajan, Arun&lt;br/&gt;&lt;br/&gt;Abstract: This paper analyzes optimal monopoly pricing under incompleteinformation for a good that displays positive network effects. Incontrast with standard models of network effects (Katz and Shapiro,1985), the good modeled in this paper is consumed in variable quantitiesby heterogeneous customers, and the magnitude of the network effectstherefore depends on the total quantity consumed across customers,rather than the total number of adopters. In addition, the value eachcustomer gets on account of the network effects depends on thecustomer&amp;rsquo;s individual consumption, as well as the customer&amp;rsquo;stype. Examples of products that fit this description at least partiallyinclude corporate desktop software (where customers are corporations ofvarying sizes, with varying intensity of software usage acrossemployees) and online trading services (such as those offered by eBay,where network effects increase with increased trading volume).</description>
      <pubDate>Sun, 28 Sep 2003 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Networks, Information &amp;amp; Social Capital</title>
      <link>http://hdl.handle.net/2451/27757</link>
      <description>Title: Networks, Information &amp;amp; Social Capital&lt;br/&gt;&lt;br/&gt;Aral, Sinan; Van Alstyne, Marshall&lt;br/&gt;&lt;br/&gt;Abstract: This paper investigates how information flows enable individuals togenerate social capital from their social networks. By combining socialnetwork and performance data with the information content encoded inemail communication, we examine the long held but empirically untestedassumption that diverse networks drive performance by providing accessto novel information. We demonstrate empirically that diverse networksdo provide diverse, novel information, and that access to novelinformation predicts performance. But whether diverse networks delivernovel information depends on a tradeoff between network diversity andrelationship channel bandwidth: as networks become more diverse,communication over each channel contracts. As network diversity andchannel bandwidth both enable access to more novel information, diversenetworks provide more novel information (a) when the topic space islarge, (b) when topics are distributed non-uniformly across nodes and(c) when information in the network changes frequently. Diverse networksare not just pipes into diverse knowledge pools, but also inspirenon-redundant communication even when the knowledge endowments ofcontacts are homogeneous. Consistent with theories of cognitivecapacity, bounded rationality, and information overload, there arepositive but diminishing performance returns to novel information.Network diversity also contributes to performance when controlling forthe performance effects of novel information, suggesting additionalnon-information based benefits to structural diversity. These analysesunpack the mechanisms that enable information advantages in networks andserve as a &amp;lsquo;proof-of-concept&amp;rsquo; for using email content toanalyze relationships among information, networks and social capital.</description>
      <pubDate>Mon, 10 Nov 2008 21:26:23 GMT</pubDate>
    </item>
    <item>
      <title>Network-Based Marketing: Identifying Likely Adopters via Consumer Networks</title>
      <link>http://hdl.handle.net/2451/27811</link>
      <description>Title: Network-Based Marketing: Identifying Likely Adopters via Consumer Networks&lt;br/&gt;&lt;br/&gt;Hill, Shawndra; Provost, Foster; Volinsky, Chris&lt;br/&gt;&lt;br/&gt;Abstract: Network-based marketing refers to a collection of marketing techniquesthat take advantage of links between consumers to increase sales. Weconcentrate on the consumer networks formed using direct interactions(e.g., communications) between consumers. We survey the diverseliterature on such marketing with an emphasis on the statistical methodsused and the data to which these methods have been applied. We alsoprovide a discussion of challenges and opportunities for this burgeoningresearch topic. Our survey highlights a gap in the literature. Becauseof inadequate data, prior studies have not been able to provide direct,statistical support for the hypothesis that network linkage can directlyaffect product/service adoption. Using a new data set that representsthe adoption of a new telecommunications service, we show very strongsupport for the hypothesis. Specifically, we show three main results:(1) &amp;ldquo;Network neighbors&amp;rdquo;&amp;mdash;those consumers linked to aprior customer&amp;mdash;adopt the service at a rate 3&amp;ndash;5 times greaterthan baseline groups selected by the best practices of the firm&amp;rsquo;smarketing team. In addition, analyzing the network allows the firm toacquire new customers who otherwise would have fallen through thecracks, because they would not have been identified based on traditionalattributes. (2) Statistical models, built with a very large amount ofgeographic, demographic and prior purchase data, are significantly andsubstantially improved by including network information. (3) Moredetailed network information allows the ranking of the network neighborsso as to permit the selection of small sets of individuals with veryhigh probabilities of adoption.</description>
      <pubDate>Wed, 03 Dec 2008 17:40:07 GMT</pubDate>
    </item>
    <item>
      <title>Modeling Complex Networks For (Electronic) Commerce</title>
      <link>http://hdl.handle.net/2451/27817</link>
      <description>Title: Modeling Complex Networks For (Electronic) Commerce&lt;br/&gt;&lt;br/&gt;Provost, Foster; Sundararajan, Arun</description>
      <pubDate>Mon, 11 Jun 2007 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Modeling and Managing Changes in Text Databases</title>
      <link>http://hdl.handle.net/2451/27822</link>
      <description>Title: Modeling and Managing Changes in Text Databases&lt;br/&gt;&lt;br/&gt;Ipeirotis, Panagiotis; Ntoulas, Alexandros; Cho, Junghoo; Gravano, Luis&lt;br/&gt;&lt;br/&gt;Abstract: Large amounts of (often valuable) information are stored inweb-accessible text databases. &amp;ldquo;Metasearchers&amp;rdquo; provideunified interfaces to query multiple such databases at once. Forefficiency, metasearchers rely on succinct statistical summaries of thedatabase contents to select the best databases for each query. So far,database selection research has largely assumed that databases arestatic, so the associated statistical summaries do not evolve over time.However, databases are rarely static and the statistical summaries thatdescribe their contents need to be updated periodically to reflectcontent changes. In this article, we first report the results of a studyshowing how the content summaries of 152 real web databases evolved overa period of 52 weeks. Then, we show how to use &amp;ldquo;survivalanalysis&amp;rdquo; techniques in general, and Cox&amp;rsquo;s proportionalhazards regression in particular, to model database changes over timeand predict when we should update each content summary. Finally, weexploit our change model to devise update schedules that keep thesummaries up to date by contacting databases only when needed, and thenwe evaluate the quality of our schedules experimentally over real web databases.</description>
      <pubDate>Sun, 29 Jul 2007 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Mining Face-to-Face Interaction Networks Using Sociometric Badges:
Predicting Productivity in an IT Configuration Task</title>
      <link>http://hdl.handle.net/2451/27756</link>
      <description>Title: Mining Face-to-Face Interaction Networks Using Sociometric Badges:Predicting Productivity in an IT Configuration Task&lt;br/&gt;&lt;br/&gt;Wu, Lynn; Waber, Ben; Aral, Sinan; Brynjolfsson, Erik; Pentland, Alex&lt;br/&gt;&lt;br/&gt;Abstract: Social network theories (e.g. Granovetter 1973, Burt 1992) andinformation richness theory (Daft &amp;amp; Lengel 1987) have both been usedindependently to understand knowledge transfer in information intensivework settings. Social network theories explain how network structurescovary with the diffusion and distribution of information, but largelyignore characteristics of the communication channels (or media) throughwhich information and knowledge are transferred. Information richnesstheory on the other hand focuses explicitly on the communication channelrequirements for different types of knowledge transfer but ignores thepopulation level topology through which information is transferred in anetwork. This paper aims to bridge these two sets of theories tounderstand what types of social structures are most conducive totransferring knowledge and improving work performance in face-tofacecommunication networks. Using a novel set of data collection tools,techniques and methodologies, we were able to record precise data on theface-to-face interaction networks, tonal conversational variation andphysical proximity of a group of IT configuration specialists over a onemonth period while they conducted their work. Linking these data todetailed performance and productivity metrics, we find four mainresults. First, the face-toface communication networks of productiveworkers display very different topological structures compared to thosediscovered for email networks in previous research. In face-to-facenetworks, network cohesion is positively correlated with higher workerproductivity, while the opposite is true in email communication. Second,network cohesion in face-to-face networks is associated with even higherwork performance when executing complex tasks. This result suggests thatnetwork cohesion may complement information-rich communication media fortransferring the complex or tacit knowledge needed to complete complextasks. Third, the most effective network structures for&amp;ldquo;latent&amp;rdquo; social networks (those that characterize thenetwork of available communication partners) differ from&amp;ldquo;intask&amp;rdquo; social networks (those that characterize thenetwork of communication partners that are actualized during theexecution of a particular task). Finally, the effect of cohesion is muchstronger in face-to-face networks than in physical proximity networks,demonstrating that information flows in actual conversations (ratherthan mere physical proximity) are driving our results. Our work bridgestwo influential bodies of research in order to contrast face-to-facenetwork structure with network structure in electronic communication. Wealso contribute a novel set of tools and techniques for discovering andrecording precise face-to-face interaction data in real world work settings.</description>
      <pubDate>Mon, 10 Nov 2008 21:21:15 GMT</pubDate>
    </item>
    <item>
      <title>Managing Digital Piracy: Pricing and Protection</title>
      <link>http://hdl.handle.net/2451/27797</link>
      <description>Title: Managing Digital Piracy: Pricing and Protection&lt;br/&gt;&lt;br/&gt;Sundararajan, Arun&lt;br/&gt;&lt;br/&gt;Abstract: This paper analyzes the optimal choice of pricing schedules andtechnological deterrence levels in a market with digital piracy wheresellers can influence the degree of piracy by implementing digitalrights management (DRM) systems. It is shown that a monopolist&amp;rsquo;soptimal pricing schedule can be characterized as a simple combination ofthe zero-piracy pricing schedule and a piracy-indifferent pricingschedule that makes all customers indifferent between legal usage andpiracy. An increase in the quality of pirated goods, while loweringprices and profits, increases total surplus by expanding both thefraction of legal users and the volume of legal usage. In the absence ofprice discrimination, a seller&amp;rsquo;s optimal level of technology-basedprotection against piracy is shown to be at the technologically maximallevel, which maximizes the difference between the quality of the legaland pirated goods. However, when a seller can price discriminate, itsoptimal choice is always a strictly lower level of technology-basedprotection. These results are based on the following digital rightsconjecture: that granting digital rights increases the incidence ofdigital piracy, and that managing digital rights therefore involvesrestricting the rights of usage that contribute to customer value.Moreover, if a digital rights management system weakens over time due tothe underlying technology being progressively hacked, a seller&amp;rsquo;soptimal strategic response may involve either increasing or decreasingits level of technology-based protection. This direction of change isrelated to whether the DRM technology implementing each marginalreduction in piracy is increasingly less or more vulnerable to hacking.Pricing and technology choice guidelines are presented, and some welfareimplications are discussed.</description>
      <pubDate>Sun, 29 Aug 2004 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Machine Learning from Imbalanced Data Sets 101</title>
      <link>http://hdl.handle.net/2451/27763</link>
      <description>Title: Machine Learning from Imbalanced Data Sets 101&lt;br/&gt;&lt;br/&gt;Provost, Foster&lt;br/&gt;&lt;br/&gt;Abstract: For research to progress most effectively, we first should establishcommon ground regarding just what is the problem that imbalanced datasets present to machine learning systems. Why and when should imbalanceddata sets be problematic? When is the problem simply an artifact ofeasily rectified design choices? I will try to pick the low-hangingfruit and share them with the rest of the workshop participants.Specifically, I would like to discuss what the problem is not. I hopethis will lead to a profitable discussion of what the problem indeed is,and how it might be addressed most effectively.&lt;br/&gt;&lt;br/&gt;Description: Invited paper for the AAAI'2000 Workshop on Imbalanced Data Sets.</description>
      <pubDate>Mon, 17 Nov 2008 16:16:35 GMT</pubDate>
    </item>
    <item>
      <title>Learning When Training Data are Costly: The Effect of Class Distribution
on Tree Induction</title>
      <link>http://hdl.handle.net/2451/27769</link>
      <description>Title: Learning When Training Data are Costly: The Effect of Class Distributionon Tree Induction&lt;br/&gt;&lt;br/&gt;Weiss, Gary; Provost, Foster&lt;br/&gt;&lt;br/&gt;Abstract: For large, real-world inductive learning problems, the number oftraining examples often must be limited due to the costs associated withprocuring, preparing, and storing the training examples and/or thecomputational costs associated with learning from them. In suchcircumstances, one question of practical importance is: if only ntraining examples can be selected, in what proportion should the classesbe represented? In this article we help to answer this question byanalyzing, for a fixed training-set size, the relationship between theclass distribution of the training data and the performance ofclassification trees induced from this data. We study twenty-six datasets and, for each, determine the best class distribution for learning.The naturally occurring class distribution is shown to generally performwell when classifier performance is evaluated using undifferentiatederror rate (0/1 loss). However, when the area under the ROC curve isused to evaluate classifier performance, a balanced distribution isshown to perform well. Since neither of these choices for classdistribution always generates the best-performing classifier, weintroduce a &amp;quot;budget-sensitive&amp;quot; progressive sampling algorithmfor selecting training examples based on the class associated with eachexample. An empirical analysis of this algorithm shows that the classdistribution of the resulting training set yields classifiers with good(nearly-optimal) classification performance.</description>
      <pubDate>Tue, 30 Sep 2003 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Learning and Inference in Massive Social Networks</title>
      <link>http://hdl.handle.net/2451/27812</link>
      <description>Title: Learning and Inference in Massive Social Networks&lt;br/&gt;&lt;br/&gt;Hill, Shawndra; Provost, Foster; Volinsky, Chris&lt;br/&gt;&lt;br/&gt;Abstract: Researchers and practitioners increasingly are gaining access to data onexplicit social networks. For example, telecommunications and technologyfirms record data on consumer networks (via phone calls, emails,voice-over-IP, instant messaging), and social-network portal sites suchas MySpace, Friendster and Facebook record consumer-generated data onsocial networks. Inference for fraud detection [5, 3, 8], marketing [9],and other tasks can be improved with learned models that take socialnetworks into account and with collective inference [12], which allowsinferences about nodes in the network to affect each other. However,these socialnetwork graphs can be huge, comprising millions to billionsof nodes and one or two orders of magnitude more links. This paperstudies the application of collective inference to improve predictionover a massive graph. Faced initially with a social network comprisinghundreds of millions of nodes and a few billion edges, our goal is: toproduce an approximate consumer network that is orders of magnitudesmaller, but still facilitates improved performance via collectiveinference. We introduce a sampling technique designed to reduce the sizeof the network by many orders of magnitude, but to keep linkages thatfacilitate improved prediction via collective inference. In short, thesampling scheme operates as follows: (1) choose a set of nodes ofinterest; (2) then, in analogy to snowball sampling [14], grow localgraphs around these nodes, adding their social networks, theirneighbors&amp;rsquo; social networks, and so on; (3) next, prune these localgraphs of edges which are expected to contribute little to thecollective inference; (4) finally, connect the local graphs together toform a graph with (hopefully) useful inference connectivity. We applythis sampling method to assess whether collective inference can improvelearned targeted-marketing models for a social network of consumers oftelecommunication services. Prior work [9] has shown improvement to thelearning of targeting models by including social-neighborhoodinformation&amp;mdash;in particular, information on existing customers inthe immediate social network of a potential target. However, theimprovement was restricted to the &amp;ldquo;network neighbors&amp;rdquo;, thosetargets linked to a prior customer thought to be good candidates for thenew service. Collective inference techniques may extend the predictiveinfluence of existing customers beyond their immediate neighborhoods.For the present work, our motivating conjecture has been that thisinfluence can improve prediction for consumers who are not stronglyconnected to existing customers. Our results show that this is indeedthe case: collective inference on the approximate network enablessignificantly improved predictive performance for non-network-neighborconsumers, and for consumers who have few links to existing customers.In the rest of this extended abstract we motivate our approach, describeour sampling method, present results on applying our approach to a largereal-world target marketing campaign in the telecommunications industry,and finally discuss our findings.</description>
      <pubDate>Sun, 29 Jul 2007 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>IT Assets, Organizational Capabilities, and Firm Performance: How
Resource Allocations and Organizational Differences Explain Performance Variation</title>
      <link>http://hdl.handle.net/2451/27761</link>
      <description>Title: IT Assets, Organizational Capabilities, and Firm Performance: HowResource Allocations and Organizational Differences Explain Performance Variation&lt;br/&gt;&lt;br/&gt;Aral, Sinan; Weill, Peter&lt;br/&gt;&lt;br/&gt;Abstract: Despite evidence of a positive relationship between informationtechnology (IT) investments and firm performance, results still varyacross firms and performance measures. We explore two organizationalexplanations for this variation: differences in firms&amp;rsquo; ITinvestment allocations and their IT capabilities. We develop atheoretical model of IT resources, defined as the combination ofspecific IT assets and organizational IT capabilities. We argue thatinvestments into different IT assets are guided by firms&amp;rsquo;strategies (e.g., cost leadership or innovation) and deliver value alongperformance dimensions consistent with their strategic purpose. Wehypothesize that firms derive additional value per IT dollar through amutually reinforcing system of organizational IT capabilities built oncomplementary practices and competencies. Empirically, we test theimpact of IT assets, IT capabilities, and their combination on fourdimensions of firm performance: market valuation, profitability, cost,and innovation. Our results&amp;mdash;based on data on IT investmentallocations and IT capabilities in 147 U.S. firms from 1999 to2002&amp;mdash;demonstrate that IT investment allocations and organizationalIT capabilities drive differences in firm performance. Firms&amp;rsquo;total IT investment is not associated with performance, but investmentsin specific IT assets explain performance differences along dimensionsconsistent with their strategic purpose. In addition, a system oforganizational IT capabilities strengthens the performance effects of ITassets and broadens their impact beyond their intended purpose. Theresults help explain variance in returns to IT capital across firms andexpand our understanding of alignment between IT and organizations. Weillustrate our findings with examples from a case study of 7-Eleven Japan</description>
      <pubDate>Wed, 29 Aug 2007 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Internet Exchanges for Used Goods: An Empirical Analysis of Trade
Patterns and Adverse Selection</title>
      <link>http://hdl.handle.net/2451/27747</link>
      <description>Title: Internet Exchanges for Used Goods: An Empirical Analysis of TradePatterns and Adverse Selection&lt;br/&gt;&lt;br/&gt;Ghose, Anindya&lt;br/&gt;&lt;br/&gt;Abstract: The past few years have witnessed the increasing ubiquity ofuser-generated content on seller reputation and product condition inInternet based used-good markets. Recent theoretical models of tradingand sorting in used-good markets provide testable predictions to use toexamine the presence of adverse selection and trade patterns in suchdynamic markets. A key aspect of such empirical analyses is todistinguish between product-level uncertainty and seller-leveluncertainty, an aspect the extant literature has largely ignored. Basedon a unique, 5-month panel dataset of user-generated content on usedgood quality and seller reputation feedback collected from Amazon, thispaper examines trade patterns in online used-good markets across fourproduct categories (PDAs, digital cameras, audio players, and laptops).Drawing on two different empirical tests and using content analysis tomine the textual feedback of seller reputations, the paper providesevidence that adverse selection continues to exist in online markets.First, it is shown that after controlling for price and other productand seller-related factors, higher quality goods take a longer time tosell compared to lower quality goods. Second, this result also holdswhen the relationship between sellers&amp;rsquo; reputation scores and timeto sell is examined. Third, it is shown that price declines are largerfor more unreliable products, and that products with higher levels ofintrinsic unreliability exhibit a more negative relationship betweenprice decline and volume of used good trade. Together, our findingssuggest that despite the presence of signaling mechanisms such asreputation feedback and product condition disclosures, the informationasymmetry problem between buyers and sellers persists in online marketsdue to both productbased and seller-based information uncertainty. Noconsistent evidence of substitution or complementarity effects betweenproduct-based and seller-level uncertainty are found. Implications forresearch and practice are discussed.</description>
      <pubDate>Thu, 06 Nov 2008 18:12:23 GMT</pubDate>
    </item>
    <item>
      <title>Internet Exchanges for Used Books: An Empirical Analysis of Product
Cannibalization and Welfare Impact</title>
      <link>http://hdl.handle.net/2451/27750</link>
      <description>Title: Internet Exchanges for Used Books: An Empirical Analysis of ProductCannibalization and Welfare Impact&lt;br/&gt;&lt;br/&gt;Ghose, Anindya; Smith, Michael; Telang, Rahul&lt;br/&gt;&lt;br/&gt;Abstract: Information systems and the Internet have facilitated the creation ofused-product markets that feature a dramatically wider selection, lowersearch costs, and lower prices than their brick-and-mortar counterpartsdo. The increased viability of these used-product markets has causedconcern among content creators and distributors, notably the Associationof American Publishers and Author&amp;rsquo;s Guild, who believe thatused-product markets will significantly cannibalize new product sales.This proposition, while theoretically possible, is based on speculationas opposed to empirical evidence. In this paper, we empirically analyzethe degree to which used products cannibalize new-product sales forbooks&amp;mdash;one of the most prominent used-product categories soldonline. To do this, we use a unique data set collected fromAmazon.com&amp;rsquo;s new and used book marketplaces to measure the degreeto which used products cannibalize new-product sales. We then use theseestimates to measure the resulting first-order changes in publisherwelfare and consumer surplus. Our analysis suggests that used books arepoor substitutes for new books for most of Amazon&amp;rsquo;s customers. Thecross-price elasticity of new-book demand with respect to used-bookprices is only 0.088. As a result, only 16% of used-book sales at Amazoncannibalize new-book purchases. The remaining 84% of used-book salesapparently would not have occurred at Amazon&amp;rsquo;s new-book prices.Further, our estimates suggest that this increase in book readershipfrom Amazon&amp;rsquo;s used-book marketplace increases consumer surplus byapproximately $67.21 million annually. This increase in consumersurplus, together with an estimated $45.05 million loss in publisherwelfare and a $65.76 million increase in Amazon&amp;rsquo;s profits, leadsto an increase in total welfare to society of approximately $87.92million annually from the introduction of used-book markets at Amazon.com.</description>
      <pubDate>Thu, 06 Nov 2008 18:26:53 GMT</pubDate>
    </item>
    <item>
      <title>Intelligent agents in electronic markets for information goods:
customization, preference revelation and pricing</title>
      <link>http://hdl.handle.net/2451/27809</link>
      <description>Title: Intelligent agents in electronic markets for information goods:customization, preference revelation and pricing&lt;br/&gt;&lt;br/&gt;Aron, Ravi; Sundararajan, Arun; Viswanathan, Siva&lt;br/&gt;&lt;br/&gt;Abstract: Electronic commerce has enabled the use of intelligent agenttechnologies that can evaluate buyers, customize products, and price inreal-time. Our model of an electronic market with customizable productsanalyzes the pricing, profitability and welfare implications ofagent-based technologies that price dynamically based on productpreference information revealed by consumers. We find that in making thetrade-off between better prices and better customization, consumersinvariably choose less-than-ideal products. Furthermore, this trade-offhas a higher impact on buyers on the higher end of the market and causesa transfer of consumer surplus towards buyers with a lower willingnessto pay. As buyers adjust their product choices in response to betterdemand agent technologies, seller revenues decrease since the gains frombetter buyer information are dominated by the lowering of the totalvalue created from the transactions. We study the strategic and welfareimplications of these findings, and discuss managerial and technologydevelopment guidelines.</description>
      <pubDate>Fri, 29 Oct 2004 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Information, Technology and Information Worker Productivity</title>
      <link>http://hdl.handle.net/2451/27758</link>
      <description>Title: Information, Technology and Information Worker Productivity&lt;br/&gt;&lt;br/&gt;Aral, Sinan; Brynjolfsson, Erik; Van Alstyne, Marshall&lt;br/&gt;&lt;br/&gt;Abstract: We study the fine-grained relationships among information flows, IT use,and individual information-worker productivity, by analyzing work at amidsize executive recruiting firm. We analyze both project-level andindividual-level performance using: (1) direct observation of over125,000 e-mail messages over a period of 10 months by individual workers(2) detailed accounting data on revenues, compensation, projectcompletion rates, and team membership for over 1300 projects spanning 5years, and (3) survey data on a matched set of the same workers&amp;rsquo;IT skills, IT use and information sharing. These detailed data permit usto econometrically evaluate a multistage model of production andinteraction activities at the firm, and to analyze the relationshipsamong communications flows, key technologies, work practices, andoutput. We find that (a) the structure and size of workers&amp;rsquo;communication networks are highly correlated with their performance; (b)IT use is strongly correlated with productivity but mainly by allowingmultitasking rather than by speeding up work; (c) productivity isgreatest for small amounts of multitasking but beyond an optimum,multitasking is associated with declining project completion rates andrevenue generation; and (d) asynchronous information seeking such asemail and database use promotes multitasking while synchronousinformation seeking over the phone shows a negative correlation.Overall, these data show statistically significant relationships amongsocial networks, technology use, completed projects, and revenues forproject-based information workers. Results are consistent with simpleproduction models of queuing and multitasking and these methods can bereplicated in other settings, suggesting new frontiers for bridging theresearch on social networks and IT value.</description>
      <pubDate>Mon, 10 Nov 2008 21:28:46 GMT</pubDate>
    </item>
    <item>
      <title>Information Technology Spending and Economic Productivity: A review of
'The Trouble with Computers' by Thomas K. Landauer</title>
      <link>http://hdl.handle.net/2451/27832</link>
      <description>Title: Information Technology Spending and Economic Productivity: A review of'The Trouble with Computers' by Thomas K. Landauer&lt;br/&gt;&lt;br/&gt;Yannis, Bakos</description>
      <pubDate>Thu, 29 Aug 1996 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Information Technology and Corporate Strategy: A Research Perspective</title>
      <link>http://hdl.handle.net/2451/27825</link>
      <description>Title: Information Technology and Corporate Strategy: A Research Perspective&lt;br/&gt;&lt;br/&gt;Bakos, Yannis; Treacy, Michael&lt;br/&gt;&lt;br/&gt;Abstract: The use of information technology (IT) as a competitive weapon hasbecome a popular cliche; but there is still a marked lack ofunderstanding of the issues that determine the influence of informationtechnology on a particular organization and the processes that willallow a smooth coordination of technology and corporate strategy. Thisarticle surveys the major efforts to arrive at a relevant framework andattempts to integrate them in a more comprehensive viewpoint. The focusthen turns to the major research issues in understanding the impact ofinformation technology on competitive strategy.</description>
      <pubDate>Thu, 29 May 1986 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Information Disclosure and Regulatory Compliance: Economic Issues and
Research Directions</title>
      <link>http://hdl.handle.net/2451/27745</link>
      <description>Title: Information Disclosure and Regulatory Compliance: Economic Issues andResearch Directions&lt;br/&gt;&lt;br/&gt;Ghose, Anindya&lt;br/&gt;&lt;br/&gt;Abstract: The Sarbanes Oxley Act (SOA) introduced significant changes to financialpractice and corporate governance regulation, including stringent newrules designed to protect investors by improving the accuracy andreliability of corporate disclosures. Briefly speaking, it requiresmanagement to submit a report containing an assessment of theeffectiveness of the internal control structure, a description ofmaterial weaknesses in such internal controls and of any materialnoncompliance. Such mandatory regulations can have some broaderramifications on firm profitability, market structure and socialwelfare, many of which were unintended when policy makers firstformulated this Act. Moreover, the tight coupling between complianceactivities, information disclosure and IT investments can haveimplications for IT governance because of its potential to changerelationships between technology investments and business. This articleaims to provide some intuitive insights into the trade-offs involved forfirms in disclosure of such information, and gives an overview of someresearch questions that would be of interest to academics, industryexecutives and policy makers alike.</description>
      <pubDate>Thu, 06 Nov 2008 16:14:27 GMT</pubDate>
    </item>
    <item>
      <title>Handling Missing Values when Applying Classification Models</title>
      <link>http://hdl.handle.net/2451/27813</link>
      <description>Title: Handling Missing Values when Applying Classification Models&lt;br/&gt;&lt;br/&gt;Saar-Tsechansky, Maytal; Provost, Foster&lt;br/&gt;&lt;br/&gt;Abstract: Much work has studied the effect of different treatments of missingvalues on model induction, but little work has analyzed treatments forthe common case of missing values at prediction time. This paper firstcompares several different methods&amp;mdash;predictive value imputation,the distributionbased imputation used by C4.5, and using reducedmodels&amp;mdash;for applying classification trees to instances with missingvalues (and also shows evidence that the results generalize to baggedtrees and to logistic regression). The results show that for the twomost popular treatments, each is preferable under different conditions.Strikingly the reduced-models approach, seldom mentioned or used,consistently outperforms the other two methods, sometimes by a largemargin. The lack of attention to reduced modeling may be due in part toits (perceived) expense in terms of computation or storage. Therefore,we then introduce and evaluate alternative, hybrid approaches that allowusers to balance between more accurate but computationally expensivereduced modeling and the other, less accurate but less computationallyexpensive treatments. The results show that the hybrid methods can scalegracefully to the amount of investment in computation/storage, and thatthey outperform imputation even for small investments.</description>
      <pubDate>Thu, 28 Jun 2007 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Get Another Label? Improving Data Quality and Data Mining</title>
      <link>http://hdl.handle.net/2451/27818</link>
      <description>Title: Get Another Label? Improving Data Quality and Data Mining&lt;br/&gt;&lt;br/&gt;Sheng, Victor; Provost, Foster; Ipeirotis, Panagiotis&lt;br/&gt;&lt;br/&gt;Abstract: This paper addresses the repeated acquisition of labels for data itemswhen the labeling is imperfect. We examine the improvement (or lackthereof) in data quality via repeated labeling, and focus especially onthe improvement of training labels for supervised induction. With theoutsourcing of small tasks becoming easier, for example via Rent-A-Coderor Amazon's Mechanical Turk, it often is possible to obtainless-than-expert labeling at low cost. With low-cost labeling, preparingthe unlabeled part of the data can become considerably more expensivethan labeling. We present repeated-labeling strategies of increasingcomplexity and show several main results: (i) Repeated-labeling canimprove label and model quality, but not always. (ii) When labels arenoisy, repeated labeling can be preferable to single labeling even inthe traditional setting where labels are not particularly cheap. (iii)As soon as the cost of processing the unlabeled data is not free, eventhe simple strategy of labeling everything multiple times can giveconsiderable advantage. (iv) Repeatedly labeling a carefully chosen setof points is generally preferable, and we present a robust techniquethat combines different notions of uncertainty to select data points forwhich quality should be improved. The bottom line: the results showclearly that when labeling is not perfect, selective acquisition ofmultiple labels is a strategy that data miners should have in theirrepertoire; for certain label-quality/cost regimes, the benefit is substantial.</description>
      <pubDate>Mon, 29 Oct 2007 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>From Vendors to Partners: Information Technology and Incomplete
Contracts in Buyer-Supplier Relationships</title>
      <link>http://hdl.handle.net/2451/27831</link>
      <description>Title: From Vendors to Partners: Information Technology and IncompleteContracts in Buyer-Supplier Relationships&lt;br/&gt;&lt;br/&gt;Bakos, Yannis; Brynjolfsson, Erik&lt;br/&gt;&lt;br/&gt;Abstract: As search costs and other coordination costs decline, theory predictsthat firms should optimally increase the number of suppliers with whichthey do business. Despite recent declines in these costs due toinformation technology, there is little evidence of an increase in thenumber of suppliers used. On the contrary, in many industries, firms areworking with fewer suppliers. This suggests that other forces must beaccounted for in a more complete model of buyer supplier relationships.This article uses the theory of incomplete contracts to illustrate thatincentive considerations can motivate a buyer to limit the number ofemployed suppliers. To induce suppliers to make investments that cannotbe specified and enforced in a satisfactory manner via contractualmechanism, the buyer must commit not to expropriate the ex post surplusfrom such investments. Under reasonable bargaining mechanisms, such acommitment will be more credible if the buyer can choose from feweralternative suppliers. Information technology increases the importanceof noncontractible investments by suppliers, such as quality,responsiveness, and innovation; it is shown that when such investmentsare particularly important, firms will employ fewer suppliers, and thiswill be true even when search and transaction costs are very low.</description>
      <pubDate>Sat, 29 May 1993 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Examining the Relationship Between Reviews and Sales: The Role of
Reviewer Identity Disclosure in Electronic Markets</title>
      <link>http://hdl.handle.net/2451/27748</link>
      <description>Title: Examining the Relationship Between Reviews and Sales: The Role ofReviewer Identity Disclosure in Electronic Markets&lt;br/&gt;&lt;br/&gt;Forman, Chris; Ghose, Anindya; Wiesenfeld, Batia&lt;br/&gt;&lt;br/&gt;Abstract: Consumer-generated product reviews have proliferated online, driven bythe notion that consumers&amp;rsquo; decision to purchase or not purchase aproduct is based on the positive or negative information about thatproduct they obtain from fellow consumers. Using research on informationprocessing as a foundation, we suggest that in the context of an onlinecommunity, reviewer disclosure of identity-descriptive information isused by consumers to supplement or replace product information whenmaking purchase decisions and evaluating the helpfulness of onlinereviews. Using a unique data set based on both chronologically compiledratings as well as reviewer characteristics for a given set of productsand geographical location-based purchasing behavior from Amazon, weprovide evidence that community norms are an antecedent to reviewerdisclosure of identity-descriptive information. Online community membersrate reviews containing identity-descriptive information morepositively, and the prevalence of reviewer disclosure of identityinformation is associated with increases in subsequent online productsales. In addition, we show that shared geographical location increasesthe relationship between disclosure and product sales, thus highlightingthe important role of geography in electronic commerce. Taken together,our results suggest that identity-relevant information about reviewersshapes community members&amp;rsquo; judgment of products and reviews.Implications for research on the relationship between onlineword-of-mouth (WOM) and sales, peer recognition and reputation systems,and conformity to online community norms are discussed</description>
      <pubDate>Thu, 06 Nov 2008 18:17:40 GMT</pubDate>
    </item>
    <item>
      <title>Evaluating Pricing Strategy Using e-Commerce Data: Evidence and
Estimation Challenges</title>
      <link>http://hdl.handle.net/2451/27743</link>
      <description>Title: Evaluating Pricing Strategy Using e-Commerce Data: Evidence andEstimation Challenges&lt;br/&gt;&lt;br/&gt;Sundararajan, Arun; Ghose, Anindya&lt;br/&gt;&lt;br/&gt;Abstract: As Internet-based commerce becomes increasingly widespread, large datasets about the demand for and pricing of a wide variety of productsbecome available. These present exciting new opportunities for empiricaleconomic and business research, but also raise new statistical issuesand challenges. In this article, we summarize research that aims toassess the optimality of price discrimination in the software industryusing a large e-commerce panel data set gathered from Amazon.com. Wedescribe the key parameters that relate to demand and cost that must bereliably estimated to accomplish this research successfully, and weoutline our approach to estimating these parameters. This includes amethod for &amp;ldquo;reverse engineering&amp;rdquo; actual demand levels fromthe sales ranks reported by Amazon, and approaches to estimating demandelasticity, variable costs and the optimality of pricing choicesdirectly from publicly available e-commerce data. Our analysis raisesmany new challenges to the reliable statistical analysis of e-commercedata and we conclude with a brief summary of some salient ones.</description>
      <pubDate>Thu, 06 Nov 2008 15:41:08 GMT</pubDate>
    </item>
    <item>
      <title>Effect of Electronic Secondary Markets on the Supply Chain</title>
      <link>http://hdl.handle.net/2451/27751</link>
      <description>Title: Effect of Electronic Secondary Markets on the Supply Chain&lt;br/&gt;&lt;br/&gt;Ghose, Anindya; Telang, Rahul; Krishnan, Ramayya&lt;br/&gt;&lt;br/&gt;Abstract: We present a model to investigate the competitive implications ofelectronic secondary markets that promote concurrent selling of new andused goods on a supply chain. In secondary markets where supplierscannot directly utilize used goods for practicing intertemporal pricediscrimination and where transaction costs of resales is negligible, thethreat of cannibalization of new goods by used goods become significant.We examine conditions under which it is optimal for suppliers to operatein such markets, explaining why these markets may not always bedetrimental for them. Intuitively, secondary markets provide an activeoutlet for some highvaluation consumers to sell their used goods. Thepotential for such resales lead to an 05 ghose.pmd 91 8/26/2005, 1:10 PM92 GHOSE, TELANG, AND KRISHNAN increase in consumers&amp;acirc;   valuationfor a new good, leading them to buy an additional new good. Givensufficient heterogeneity in consumer&amp;acirc;  s affinity across multiplesuppliers&amp;acirc;   products, the &amp;acirc;  market expansion effect&amp;acirc;accruing from consumers&amp;acirc;   cross-product purchase affinity canmitigate the losses incurred by suppliers from the direct &amp;acirc;cannibalization effect.&amp;acirc;   We also highlight the strategic rolethat used goods commission set by the retailer plays in determiningprofits for suppliers. We conclude the paper by empirically testing someimplications of our model using a unique data set from the online bookindustry, which has a flourishing secondary market.</description>
      <pubDate>Thu, 06 Nov 2008 19:27:05 GMT</pubDate>
    </item>
    <item>
      <title>Economical Active Feature-value Acquisition through Expected Utility Estimation</title>
      <link>http://hdl.handle.net/2451/27807</link>
      <description>Title: Economical Active Feature-value Acquisition through Expected Utility Estimation&lt;br/&gt;&lt;br/&gt;Melville, Prem; Saar-Tsechansky, Maytal; Mooney, Raymond; Provost, Foster&lt;br/&gt;&lt;br/&gt;Abstract: In many classification tasks training data have missing feature valuesthat can be acquired at a cost. For building accurate predictive models,acquiring all missing values is often prohibitively expensive orunnecessary, while acquiring a random subset of feature values may notbe most effective. The goal of active feature-value acquisition is toincrementally select feature values that are most cost-effective forimproving the model&amp;rsquo;s accuracy. We present two policies, SampledExpected Utility and Expected Utility-ES, that acquire feature valuesfor inducing a classification model based on an estimation of theexpected improvement in model accuracy per unit cost. A comparison ofthe two policies to each other and to alternative policies demonstratethat Sampled Expected Utility is preferable as it effectively reducesthe cost of producing a model of a desired accuracy and exhibits aconsistent performance across domains.</description>
      <pubDate>Fri, 29 Jul 2005 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Duplicate Record Detection: A Survey</title>
      <link>http://hdl.handle.net/2451/27823</link>
      <description>Title: Duplicate Record Detection: A Survey&lt;br/&gt;&lt;br/&gt;Elmagarmid, Ahmed; Panagiotis, Ipeirotis; Verykios, Vassilios&lt;br/&gt;&lt;br/&gt;Abstract: Often, in the real world, entities have two or more representations indatabases. Duplicate records do not share a common key and/or theycontain errors that make duplicate matching a difficult task. Errors areintroduced as the result of transcription errors, incompleteinformation, lack of standard formats, or any combination of thesefactors. In this paper, we present a thorough analysis of the literatureon duplicate record detection. We cover similarity metrics that arecommonly used to detect similar field entries, and we present anextensive set of duplicate detection algorithms that can detectapproximately duplicate records in a database. We also cover multipletechniques for improving the efficiency and scalability of approximateduplicate detection algorithms. We conclude with coverage of existingtools and with a brief discussion of the big open problems in the area.</description>
      <pubDate>Fri, 29 Dec 2006 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Distribution-based aggregation for relational learning with identifier attributes</title>
      <link>http://hdl.handle.net/2451/27810</link>
      <description>Title: Distribution-based aggregation for relational learning with identifier attributes&lt;br/&gt;&lt;br/&gt;Perlich, Claudia; Provost, Foster&lt;br/&gt;&lt;br/&gt;Abstract: Identifier attributes&amp;mdash;very high-dimensional categorical attributessuch as particular product ids or people&amp;rsquo;s names&amp;mdash;rarely areincorporated in statistical modeling. However, they can play animportant role in relational modeling: it may be informative to havecommunicated with a particular set of people or to have purchased aparticular set of products. A key limitation of existing relationalmodeling techniques is how they aggregate bags (multisets) of valuesfrom related entities. The aggregations used by existing methods aresimple summaries of the distributions of features of related entities:e.g., MEAN, MODE, SUM, or COUNT. This paper&amp;rsquo;s main contribution isthe introduction of aggregation operators that capture more informationabout the value distributions, by storing meta-data about valuedistributions and referencing this meta-data when aggregating&amp;mdash;forexample by computing class-conditional distributional distances. Suchaggregations are particularly important for aggregating values fromhigh-dimensional categorical attributes, for which the simple aggregatesprovide little information. In the first half of the paper we providegeneral guidelines for designing aggregation operators, introduce thenew aggregators in the context of the relational learning system ACORA(Automated Construction of Relational Attributes), and providetheoretical justification.We also conjecture special properties ofidentifier attributes, e.g., they proxy for unobserved attributes andfor information deeper in the relationship network. In the second halfof the paper we provide extensive empirical evidence that thedistribution-based aggregators indeed do facilitate modeling withhigh-dimensional categorical attributes, and in support of theaforementioned conjectures.</description>
      <pubDate>Thu, 26 Jan 2006 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Discover Interesting Patterns for Investment Decision Making with GLOWER
- A Genetic Learner Overlaid With Entropy Reduction</title>
      <link>http://hdl.handle.net/2451/27752</link>
      <description>Title: Discover Interesting Patterns for Investment Decision Making with GLOWER- A Genetic Learner Overlaid With Entropy Reduction&lt;br/&gt;&lt;br/&gt;Dhar, Vasant; Chou, Dashin; Provost, Foster&lt;br/&gt;&lt;br/&gt;Abstract: Prediction in financial domains is notoriously difficult for a number ofreasons. First, theories tend to be weak or non-existent, which makesproblem formulation open ended by forcing us to consider a large numberof independent variables and thereby increasing the dimensionality ofthe search space. Second, the weak relationships among variables tend tobe nonlinear, and may hold only in limited areas of the search space.Third, in financial practice, where analysts conduct extensive manualanalysis of historically well performing indicators, a key is to findthe hidden interactions among variables that perform well incombination. Unfortunately, these are exactly the patterns that thegreedy search biases incorporated by many standard rule learningalgorithms will miss. In this paper, we describe and evaluate severalvariations of a new genetic learning algorithm (GLOWER) on a variety ofdata sets. The design of GLOWER has been motivated by financialprediction problems, but incorporates successful ideas from treeinduction and rule learning. We examine the performance of severalGLOWER variants on two UCI data sets as well as on a standard financialprediction problem (S&amp;amp;P500 stock returns), using the results toidentify one of the better variants for further comparisons. Weintroduce a new (to KDD) financial prediction problem (predictingpositive and negative earnings surprises), and experiment with GLOWER,contrasting it with tree- and ruleinduction approaches. Our results areencouraging, showing that GLOWER has the ability to uncover effectivepatterns for difficult problems that have weak structure and significant nonlinearities.</description>
      <pubDate>Fri, 07 Nov 2008 21:14:36 GMT</pubDate>
    </item>
    <item>
      <title>Decision-centric Active Learning of Binary-Outcome Models</title>
      <link>http://hdl.handle.net/2451/27815</link>
      <description>Title: Decision-centric Active Learning of Binary-Outcome Models&lt;br/&gt;&lt;br/&gt;Saar-Tsechansky, Maytal; Provost, Foster&lt;br/&gt;&lt;br/&gt;Abstract: It can be expensive to acquire the data required for businesses toemploy data-driven predictive modeling, for example to model consumerpreferences to optimize targeting. Prior research has introduced&amp;ldquo;active learning&amp;rdquo; policies for identifying data that areparticularly useful for model induction, with the goal of decreasing thestatistical error for a given acquisition cost (error-centricapproaches). However, predictive models are used as part of adecision-making process, and costly improvements in model accuracy donot always result in better decisions. This paper introduces a newapproach for active data acquisition that targets decision-makingspecifically. The new decision-centric approach departs from traditionalactive learning by placing emphasis on acquisitions that are more likelyto affect decision-making. We describe two different types ofdecision-centric techniques. Next, using direct-marketing data, wecompare various data-acquisition techniques. We demonstrate thatstrategies for reducing statistical error can be wasteful in adecision-making context, and show that one decision-centric technique inparticular can improve targeting decisions significantly. We also showthat this method is robust in the face of decreasing quality of utilityestimations, eventually converging to uniform random sampling, and thatit can be extended to situations where different data acquisitions havedifferent costs. The results suggest that businesses should considermodifying their strategies for acquiring information through normalbusiness transactions. For example, a firm such as Amazon.com thatmodels consumer preferences for customized marketing may acceleratelearning by proactively offering recommendations&amp;mdash;not merely toinduce immediate sales, but for improving recommendations in the future.</description>
      <pubDate>Mon, 26 Feb 2007 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Data acquisition and cost-effective predictive modeling: targeting
offers for electronic commerce</title>
      <link>http://hdl.handle.net/2451/27816</link>
      <description>Title: Data acquisition and cost-effective predictive modeling: targetingoffers for electronic commerce&lt;br/&gt;&lt;br/&gt;Provost, Foster; Melville, Prem; Saar-Tsechansky, Maytal&lt;br/&gt;&lt;br/&gt;Abstract: Electronic commerce is revolutionizing the way we think about datamodeling, by making it possible to integrate the processes of (costly)data acquisition and model induction. The opportunity for improvingmodeling through costly data acquisition presents itself for a diverseset of electronic commerce modeling tasks, from personalization tocustomer lifetime value modeling; we illustrate with the running exampleof choosing offers to display to web-site visitors, which capturesimportant aspects in a familiar setting. Considering data acquisitioncosts explicitly can allow the building of predictive models atsignificantly lower costs, and a modeler may be able to improveperformance via new sources of information that previously were tooexpensive to consider. However, existing techniques for integratingmodeling and data acquisition cannot deal with the rich environment thatelectronic commerce presents. We discuss several possible dataacquisition settings, the challenges involved in the integration withmodeling, and various research areas that may supply parts of anultimate solution. We also present and demonstrate briefly a unifiedframework within which one can integrate acquisitions of differenttypes, with any cost structure and any predictive modeling objective</description>
      <pubDate>Sun, 29 Jul 2007 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Confidence Bands for ROC Curves: Methods and an Empirical Study</title>
      <link>http://hdl.handle.net/2451/27802</link>
      <description>Title: Confidence Bands for ROC Curves: Methods and an Empirical Study&lt;br/&gt;&lt;br/&gt;Macskassy, Sofus; Provost, Foster&lt;br/&gt;&lt;br/&gt;Abstract: In this paper we study techniques for generating and evaluatingconfidence bands on ROC curves. ROC curve evaluation is rapidly becominga commonly used evaluation metric in machine learning, althoughevaluating ROC curves has thus far been limited to studying the areaunder the curve (AUC) or generation of one-dimensional confidenceintervals by freezing one variable&amp;mdash;the false-positive rate, orthreshold on the classification scoring function. Researchers in themedical field have long been using ROC curves and have many well-studiedmethods for analyzing such curves, including generating confidenceintervals as well as simultaneous confidence bands. In this paper weintroduce these techniques to the machine learning community and showtheir empirical fitness on the Covertype data set&amp;mdash;a standardmachine learning benchmark from the UCI repository. We show how some ofthese methods work remarkably well, others are too loose, and thatexisting machine learning methods for generation of 1-dimensionalconfidence intervals do not translate well to generation of simultanousbands&amp;mdash;their bands are too tight.</description>
      <pubDate>Thu, 29 Jul 2004 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Competition between Local and Electronic Markets: How the benefit of
buying online depends on where you live</title>
      <link>http://hdl.handle.net/2451/27746</link>
      <description>Title: Competition between Local and Electronic Markets: How the benefit ofbuying online depends on where you live&lt;br/&gt;&lt;br/&gt;Forman, Chris; Ghose, Anindya; Goldfarb, Avi&lt;br/&gt;&lt;br/&gt;Abstract: We empirically examine the trade-off between the benefits of buyingonline and the benefits of buying in a local retail store. How does aconsumer&amp;rsquo;s physical location shape the relative benefits of buyingfrom the online world? We explore this problem using data from Amazon onthe top selling books for 1497 unique locations in the US for 10 monthsending in January 2006. In particular, we examine what happens when alarge bookstore opens and when a discount retailer opens. We show thateven controlling for productspecific preferences by location, changes inlocal retail options have substantial effects on online purchases. Whena store opens locally, we find evidence that people substitute away fromonline purchasing, demonstrating that consumers appear to respond toincreased convenience in the offline channel. These estimates areeconomically large, suggesting that disutility costs of purchasingonline are substantial and that offline transportation costs matter. Wealso show that offline entry decreases consumers&amp;rsquo; sensitivity toonline price discounts. We find no consistent evidence that the breadthof the product line at a local retail store affects purchases althoughbreadth seems to matter in university towns and larger cities. Our papershows that the parameters in existing theoretical models of channelsubstitution such as offline transportation cost, online disutilitycost, market coverage, and the prices of online and offline retailersinteract to determine consumer choice of channels. In this way, ourresults provide empirical support for many such models.</description>
      <pubDate>Thu, 06 Nov 2008 18:08:17 GMT</pubDate>
    </item>
    <item>
      <title>Classification-Aware Hidden-Web Text Database Selection,</title>
      <link>http://hdl.handle.net/2451/27824</link>
      <description>Title: Classification-Aware Hidden-Web Text Database Selection,&lt;br/&gt;&lt;br/&gt;Ipeirotis, Panagiotis; Gravano, Luis&lt;br/&gt;&lt;br/&gt;Abstract: Many valuable text databases on the web have noncrawlable contents thatare &amp;ldquo;hidden&amp;rdquo; behind search interfaces. Metasearchers arehelpful tools for searching over multiple such &amp;ldquo;hidden-web&amp;rdquo;text databases at once through a unified query interface. An importantstep in the metasearching process is database selection, or determiningwhich databases are the most relevant for a given user query. Thestate-of-the-art database selection techniques rely on statisticalsummaries of the database contents, generally including the databasevocabulary and associated word frequencies. Unfortunately, hidden-webtext databases typically do not export such summaries, so previousresearch has developed algorithms for constructing approximate contentsummaries from document samples extracted from the databases viaquerying.We present a novel &amp;ldquo;focused-probing&amp;rdquo; samplingalgorithm that detects the topics covered in a database and adaptivelyextracts documents that are representative of the topic coverage of thedatabase. Our algorithm is the first to construct content summaries thatinclude the frequencies of the words in the database. Unfortunately,Zipf&amp;rsquo;s law practically guarantees that for any relatively largedatabase, content summaries built from moderately sized document sampleswill fail to cover many low-frequency words; in turn, incomplete contentsummaries might negatively affect the database selection process,especially for short queries with infrequent words. To enhance thesparse document samples and improve the database selection decisions, weexploit the fact that topically similar databases tend to have similarvocabularies, so samples extracted from databases with a similar topicalfocus can complement each other. We have developed two databaseselection algorithms that exploit this observation. The first algorithmproceeds hierarchically and selects the best categories for a query, andthen sends the query to the appropriate databases in the chosencategories. The second algorithm uses &amp;ldquo;shrinkage,&amp;rdquo; astatistical technique for improving parameter estimation in the face ofsparse data, to enhance the database content summaries withcategory-specific words.We describe how to modify existing databaseselection algorithms to adaptively decide (at runtime) whether shrinkageis beneficial for a query. A thorough evaluation over a variety ofdatabases, including 315 real web databases as well as TREC data,suggests that the proposed sampling methods generate high-qualitycontent summaries and that the database selection algorithms producesignificantly more relevant database selection decisions and overallsearch results than existing algorithms.</description>
      <pubDate>Wed, 27 Feb 2008 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Classification in Networked Data: A Toolkit and a Univariate Case Study</title>
      <link>http://hdl.handle.net/2451/27814</link>
      <description>Title: Classification in Networked Data: A Toolkit and a Univariate Case Study&lt;br/&gt;&lt;br/&gt;Mcskassy, Sofus; Provost, Foster&lt;br/&gt;&lt;br/&gt;Abstract: This paper1 is about classifying entities that are interlinked withentities for which the class is known. After surveying prior work, wepresent NetKit, a modular toolkit for classification in networked data,and a case-study of its application to networked data used in priormachine learning research. NetKit is based on a node-centric frameworkin which classifiers comprise a local classifier, a relationalclassifier, and a collective inference procedure. Various existingnode-centric relational learning algorithms can be instantiated withappropriate choices for these components, and new combinations ofcomponents realize new algorithms. The case study focuses on univariatenetwork classification, for which the only information used is thestructure of class linkage in the network (i.e., only links and someclass labels). To our knowledge, no work previously has evaluatedsystematically the power of class-linkage alone for classification inmachine learning benchmark data sets. The results demonstrate that verysimple network-classification models perform quite well&amp;mdash;wellenough that they should be used regularly as baseline classifiers forstudies of learning with networked data. The simplest method (whichperforms remarkably well) highlights the close correspondence betweenseveral existing methods introduced for different purposes&amp;mdash;thatis, Gaussian-field classifiers, Hopfield networks, andrelational-neighbor classifiers. The case study also shows that thereare two sets of techniques that are preferable in different situations,namely when few versus many labels are known initially. We alsodemonstrate that link selection plays an important role similar totraditional feature selection</description>
      <pubDate>Sat, 28 Apr 2007 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Assessing Network Applications for Economic Development</title>
      <link>http://hdl.handle.net/2451/27755</link>
      <description>Title: Assessing Network Applications for Economic Development&lt;br/&gt;&lt;br/&gt;Aral, Sinan; Escobari, Marcela; Nishina, Randal&lt;br/&gt;&lt;br/&gt;Abstract: PAE Team&amp;rsquo;s Objectives &amp;Oslash; Create a survey instrument toassess the impact of technology intervention in rural India &amp;Oslash;Advise on potential applications for village-level Internet terminalsThe aim of the Sustainable Access in Rural India (SARI) project is toimprove the lives of individuals in poor rural communities by leveraginginformation and communications technologies to facilitate economicdevelopment. Ultimately, the project&amp;rsquo;s success will be measured byits social and economic impact and viability, which depends criticallyon the appropriateness of applications provided to end-users. Ourconclusions and recommendations concerning applications are as follows:A price application that posts the daily price fluctuations of certaingoods in order to promote competition among sellers and improve theeconomic decision making of villagers and traders may not increaseefficiency or further economic development; its effectiveness dependscritically on geographic scope and a focus on goods whose prices exhibitsufficient price volatility and differentiation. We recommend a centralweb sitebased price application, with independent kiosk operatorsresponsible for inputting price information from villages in whichmarkets exist. A spot labor market application that aggregates supplyand demand of jobs for clusters of villages holds muchpromise&amp;mdash;there are potential benefits from coordinating labormarkets in the area studied. The relatively constant need for workcoupled with unmet demand suggests that there is a significantwillingness to pay for a service that matches supply and demand forlabor in a timely and accountable way. We recommend a bulletin-boardtype labor market application that connects small numbers of nearbyvillages. EXECUTIVE SUMMARY  iv An agriculture application thataddresses the basic knowledge needs of farmers, providing weatherforecasts and information on farming techniques must include tailoredcontent, given the diversity of crops grown and methods employedthroughout the region. We recommend a local content creation mechanism,facilitating farmer access to agricultural expertise via simple voice ortext communications, or a more robust web-based application.Deficiencies in the current state-provided healthcare infrastructure maylimit the initial impact of IT within local Public Health Centers(PHCs). Instead, we suggest a health care application that deliversinformation and services to villagers directly through communitycenters. Based on villager awareness levels and needs, we recommend agovernment services application that would enable villagers to accessinformation on relevant government programs and initiate online requestsfor necessary government documents. While applications to facilitateeducation (particularly adult learning) may be useful, there appear tobe significant implementation barriers at the school level. Themotivation for these proposed applications stems from several regionalattributes, inferred from local economic data and extensive interviewswith villagers, school representatives, health workers and NGO staffmembers: &amp;Oslash; Many if not most villages exhibit segregation alongreligious and/or caste lines. &amp;Oslash; While some data is readilyavailable and disseminated (e.g., prices of heavily traded goods), otherpotentially critical pieces of information are not easily accessible tovillagers (e.g., livestock prices, agricultural advice, governmentprograms). &amp;Oslash; A majority of all economic activity either directlyor indirectly involves agriculture, and much of a typicalvillager&amp;rsquo;s social activity relates to agriculture. &amp;Oslash; Amajority of laborers are without a regular source ofemployment&amp;mdash;unemployment is extremely cyclical, reaching highlevels during the agricultural off-season.</description>
      <pubDate>Mon, 10 Nov 2008 21:18:35 GMT</pubDate>
    </item>
    <item>
      <title>Are Digital Rights Valuable? Theory and Evidence from the eBook Industry</title>
      <link>http://hdl.handle.net/2451/27796</link>
      <description>Title: Are Digital Rights Valuable? Theory and Evidence from the eBook Industry&lt;br/&gt;&lt;br/&gt;Sundararajan, Arun; Oestreicher-Singer, Gal&lt;br/&gt;&lt;br/&gt;Abstract: The effective management of digital rights is a crucial challenge inmany industries making the transition from physical to digital products.We present an economic model that characterizes the value of digitalrights when products are sold both embedded in tangible physicalartifacts and as pure digital goods, and when granting digital rightsmay also affect the extent of digital piracy. Our model indicates thatin the absence of piracy, digital rights should be unrestricted, since aseller can use their pricing strategy to optimally balance sales betweenphysical and digital goods. However, the threat of piracy limits theextent to which digital rights should be granted: the value of digitalrights is determined not only by their direct effect on the quality oflegal digital goods, but by their effect on the differential quality oflegal and pirated digital goods. When the latter effect is negative,granting digital rights may have a detrimental effect on value; ourmodel indicates that this kind of effect is more likely to be observedfor digital rights that aim to replicate the consumption experience ofphysical goods, rather than enhancing a customer&amp;rsquo;s digitalexperience. We test the predictions of our analytical model using datafrom the ebook industry. Our empirical evidence supports our theoreticalresults, showing that four separate digital rights each have asignificant impact on ebook prices, and establishing that those two thatare most strongly associated with digital piracy have a negative impacton seller value. We also show, both analytically and empirically, thatsellers should increase the prices of digital goods as the prices oftheir physical counterparts increase, but decrease them as thetechnological sophistication of their potential customers increases. Ourresults represent new evidence of the importance of an informed andjudicious choice of the different digital rights permitted byone&amp;rsquo;s DRM platform, and provide a framework for guiding managersin industries that are progressively being &amp;ldquo;digitized.&amp;rdquo;</description>
      <pubDate>Wed, 29 Oct 2003 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>An Exploratory Study of the Emerging Role of Electronic Intermediaries</title>
      <link>http://hdl.handle.net/2451/27838</link>
      <description>Title: An Exploratory Study of the Emerging Role of Electronic Intermediaries&lt;br/&gt;&lt;br/&gt;Bakos, Yannis; Bailey, Joseph&lt;br/&gt;&lt;br/&gt;Abstract: It is often argued that as electronic markets lower the cost of markettransactions, traditional roles for intermediaries will be eliminated,leading to &amp;quot;disintermediation.&amp;quot; We discuss the findings of anexploratory study of intermediaries in electronic markets, which suggestthat markets do not necessarily become disintermediated as they becomefacilitated by information technology. We explore thirteen case studiesof firms participating in electronic commerce and find evidence ofcertain new emerging roles for electronic intermediaries, including:aggregating, matching suppliers and customers, providing trust, andproviding inter-organizational market information. Two specific examplesare discussed in greater detail to illustrate an unsuccessful strategyfor electronic intermediation (BargainFinder) as well as a successfulone (Firefly).</description>
      <pubDate>Tue, 29 Oct 1996 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Aggregation-Based Feature Invention and Relational Concept Classes</title>
      <link>http://hdl.handle.net/2451/27768</link>
      <description>Title: Aggregation-Based Feature Invention and Relational Concept Classes&lt;br/&gt;&lt;br/&gt;Perlich, Claudia; Provost, Foster&lt;br/&gt;&lt;br/&gt;Abstract: Model induction from relational data requires aggregation of values ofattributes of related entities. This paper makes three contributions tothe study of relational learning.(1) It presents a hierarchy ofrelational concepts of increasing complexity, using relational schemacharacteristics such as cardinality, and derives classes of aggregationoperators that are needed to learn these concepts. (2) Expanding onelevel of the hierarchy, it introduces new aggregation operators thatmodel the distribution of the values to be aggregated and (forclassification problems) the differences in these distributions byclass. (3) It demonstrates empirically on a noisy business domain thatmore-complex aggregation methods can increase generalizationperformance. Constructing features using target-dependent aggregationscan transform relational prediction tasks so that well-understoodfeature-vector-based modeling algorithms can be applied successfully.</description>
      <pubDate>Sat, 23 Aug 2003 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Active Sampling for Class Probability Estimation and Ranking</title>
      <link>http://hdl.handle.net/2451/27800</link>
      <description>Title: Active Sampling for Class Probability Estimation and Ranking&lt;br/&gt;&lt;br/&gt;Provost, Foster; Saar-Tsechansky, Maytal&lt;br/&gt;&lt;br/&gt;Abstract: Abstract. In many cost-sensitive environments class probabilityestimates are used by decision makers to evaluate the expected utilityfrom a set of alternatives. Supervised learning can be used to buildclass probability estimates; however, it often is very costly to obtaintraining data with class labels. Active learning acquires dataincrementally, at each phase identifying especially useful additionaldata for labeling, and can be used to economize on examples needed forlearning. We outline the critical features of an active learner andpresent a sampling-based active learning method for estimating classprobabilities and class-based rankings. BOOTSTRAP-LV identifiesparticularly informative new data for learning based on the variance inprobability estimates, and uses weighted sampling to account for apotential example&amp;rsquo;s informative value for the rest of the inputspace.We show empirically that the method reduces the number of dataitems that must be obtained and labeled, across a wide variety ofdomains.We investigate the contribution of the components of thealgorithm and showthat each provides valuable information to helpidentify informative examples.We also compare BOOTSTRAP-LV withUNCERTAINTY SAMPLING, an existing active learning method designed tomaximize classification accuracy. The results show that BOOTSTRAP-LVuses fewer examples to exhibit a certain estimation accuracy and provideinsights to the behavior of the algorithms. Finally, we experiment withanother new active sampling algorithm drawing from both UNCERTAINTYSAMPLING and BOOTSTRAP-LV and show that it is significantly morecompetitive with BOOTSTRAPLV compared to UNCERTAINTY SAMPLING. Theanalysis suggests more general implications for improving existingactive sampling algorithms for classification.</description>
      <pubDate>Wed, 29 Oct 2003 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>Active Feature-Value Acquisition for Classifier Induction</title>
      <link>http://hdl.handle.net/2451/27801</link>
      <description>Title: Active Feature-Value Acquisition for Classifier Induction&lt;br/&gt;&lt;br/&gt;Melville, Prem; Saar-Tsechansky, Maytal; Provost, Foster; Mooney, Raymond&lt;br/&gt;&lt;br/&gt;Abstract: Many induction problems include missing data that can be acquired at acost. For building accurate predictive models, acquiring completeinformation for all instances is often expensive or unnecessary, whileacquiring information for a random subset of instances may not be mosteffective. Active feature-value acquisition tries to reduce the cost ofachieving a desired model accuracy by identifying instances for whichobtaining complete information is most informative. We present anapproach in which instances are selected for acquisition based on thecurrent model&amp;rsquo;s accuracy and its confidence in the prediction.Experimental results demonstrate that our approach can induce accuratemodels using substantially fewer feature-value acquisitions as comparedto alternative policies.</description>
      <pubDate>Fri, 29 Oct 2004 22:58:59 GMT</pubDate>
    </item>
    <item>
      <title>A Simple Relational Classifier</title>
      <link>http://hdl.handle.net/2451/27771</link>
      <description>Title: A Simple Relational Classifier&lt;br/&gt;&lt;br/&gt;Provost, Foster; Macskassy, Sofus&lt;br/&gt;&lt;br/&gt;Abstract: We analyze a Relational Neighbor (RN) classifier, a simple relationalpredictive model that predicts only based on class labels of relatedneighbors, using no learning and no inherent attributes.We show that itperforms surprisingly well by comparing it to more complex models suchas Probabilistic Relational Models and Relational Probability Trees onthree data sets from published work. We argue that a simple model suchas this should be used as a baseline to assess the performance ofrelational learners.</description>
      <pubDate>Wed, 19 Nov 2008 21:56:36 GMT</pubDate>
    </item>
  </channel>
</rss>

