|
Archive@NYU >
Stern School of Business >
IOMS: Information Systems Working Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/14801
|
| Title: | Modeling and Managing Changes in Text Databases |
| Authors: | Ipeirotis, Panagiotis G. Ntoulas, Alexandros Cho, Junghoo Gravano, Luis |
| Keywords: | Metasearching Text Database Selection Distributed Information Retrieval Update Scheduling |
| Issue Date: | 23-Jun-2006 |
| Series/Report no.: | CeDER-06-07 |
| Abstract: | Large amounts of (often valuable) information are stored in
web-accessible text databases. ``Metasearchers'' provide unified
interfaces to query multiple such databases at once. For efficiency,
metasearchers rely on succinct statistical summaries of the database
contents to select the best databases for each query. So far, database
selection research has largely assumed that databases are static, so the
associated statistical summaries do not need to change over time.
However, databases are rarely static and the statistical summaries that
describe their contents need to be updated periodically to reflect
content changes. In this article, we first report the results of a
study showing how the content summaries of 152 real web databases
evolved over a period of 52 weeks. Then, we show how to use ``survival
analysis'' techniques in general, and Cox's proportional hazards
regression in particular, to model database changes over time and
predict when we should update each content summary. Finally, we exploit
our change model to devise update schedules that keep the summaries up
to date by contacting databases only when needed, and then we evaluate
the quality of our schedules experimentally over real web databases. |
| URI: | http://hdl.handle.net/2451/14801 |
| Appears in Collections: | CeDER Working Papers IOMS: Information Systems Working Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|