|
Archive@NYU >
Stern School of Business >
CeDER Published Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/27820
|
| Title: | QProber: A System for Automatic Classification of Hidden-Web Databases |
| Authors: | Ipeirotis, Panagiotis Gravano, Luis |
| Keywords: | database classification web databases hidden web |
| Issue Date: | 1-Jan-2003 |
| Publisher: | Information Systems |
| Citation: | ACM Transactions on Information Systems (TOIS), vol. 21, no. 1, January 2003 |
| Series/Report no.: | CeDER-PP-2003-08 |
| Abstract: | The contents of many valuable Web-accessible databases are only
available through search interfaces and are hence invisible to
traditional Web “crawlers.” Recently, commercial Web sites
have started to manually organize Web-accessible databases into
Yahoo!-like hierarchical classification schemes. Here we introduce
QProber, a modular system that automates this classification process by
using a small number of query probes, generated by document classifiers.
QProber can use a variety of types of classifiers to generate the
probes. To classify a database, QProber does not retrieve or inspect any
documents or pages from the database, but rather just exploits the
number of matches that each query probe generates at the database in
question. We have conducted an extensive experimental evaluation of
QProber over collections of real documents, experimenting with different
types of document classifiers and retrieval models. We have also tested
our system with over one hundred Web-accessible databases. Our
experiments show that our system has low overhead and achieves high
classification accuracy across a variety of databases. |
| URI: | http://hdl.handle.net/2451/27820 |
| Appears in Collections: | CeDER Published Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|