|
Archive@NYU >
Stern School of Business >
CeDER Published Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/27822
|
| Title: | Modeling and Managing Changes in Text Databases |
| Authors: | Ipeirotis, Panagiotis Ntoulas, Alexandros Cho, Junghoo Gravano, Luis |
| Keywords: | metasearching text database selection distributed information retrieval |
| Issue Date: | Aug-2007 |
| Publisher: | ACM Transactions on Database Systems |
| Citation: | ACM Transactions on Database Systems (TODS), vol. 32, no. 3, article 14,
August 2007 |
| Series/Report no.: | CeDER-PP-2007-14 |
| Abstract: | Large amounts of (often valuable) information are stored in
web-accessible text databases. “Metasearchers” provide
unified interfaces to query multiple such databases at once. For
efficiency, metasearchers rely on succinct statistical summaries of the
database contents to select the best databases for each query. So far,
database selection research has largely assumed that databases are
static, so the associated statistical summaries do not evolve over time.
However, databases are rarely static and the statistical summaries that
describe their contents need to be updated periodically to reflect
content changes. In this article, we first report the results of a study
showing how the content summaries of 152 real web databases evolved over
a period of 52 weeks. Then, we show how to use “survival
analysis” techniques in general, and Cox’s proportional
hazards regression in particular, to model database changes over time
and predict when we should update each content summary. Finally, we
exploit our change model to devise update schedules that keep the
summaries up to date by contacting databases only when needed, and then
we evaluate the quality of our schedules experimentally over real web databases. |
| URI: | http://hdl.handle.net/2451/27822 |
| Appears in Collections: | CeDER Published Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|