Faculty Digital Archive

Archive@NYU  >
Stern School of Business >
CeDER Published Papers >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2451/27820

Title: QProber: A System for Automatic Classification of Hidden-Web Databases
Authors: Ipeirotis, Panagiotis
Gravano, Luis
Keywords: database classification
web databases
hidden web
Issue Date: 1-Jan-2003
Publisher: Information Systems
Citation: ACM Transactions on Information Systems (TOIS), vol. 21, no. 1, January 2003
Series/Report no.: CeDER-PP-2003-08
Abstract: The contents of many valuable Web-accessible databases are only available through search interfaces and are hence invisible to traditional Web “crawlers.” Recently, commercial Web sites have started to manually organize Web-accessible databases into Yahoo!-like hierarchical classification schemes. Here we introduce QProber, a modular system that automates this classification process by using a small number of query probes, generated by document classifiers. QProber can use a variety of types of classifiers to generate the probes. To classify a database, QProber does not retrieve or inspect any documents or pages from the database, but rather just exploits the number of matches that each query probe generates at the database in question. We have conducted an extensive experimental evaluation of QProber over collections of real documents, experimenting with different types of document classifiers and retrieval models. We have also tested our system with over one hundred Web-accessible databases. Our experiments show that our system has low overhead and achieves high classification accuracy across a variety of databases.
URI: http://hdl.handle.net/2451/27820
Appears in Collections:CeDER Published Papers

Files in This Item:

File Description SizeFormat
CeDER-PP-2003-08.pdf3.54 MBAdobe PDFView/Open

All items in Faculty Digital Archive are protected by copyright, with all rights reserved.

 

The contents of this archive are either in the public domain or subject to copyright. Please consult NYU's "Handbook for Use of Copyrighted Materials" (http://library.nyu.edu/copyright/copyright.html) for information on using material within the Faculty Digital Archive.
Valid XHTML 1.0 | CSS