QProber: A System for Automatic Classification of Hidden-Web Databases

Ipeirotis, Panagiotis; Gravano, Luis

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ipeirotis, Panagiotis	-
dc.contributor.author	Gravano, Luis	-
dc.date.accessioned	2008-12-09T15:50:01Z	-
dc.date.available	2008-12-09T15:50:01Z	-
dc.date.issued	2003-01-01	-
dc.identifier.citation	ACM Transactions on Information Systems (TOIS), vol. 21, no. 1, January 2003	en
dc.identifier.uri	http://hdl.handle.net/2451/27820	-
dc.description.abstract	The contents of many valuable Web-accessible databases are only available through search interfaces and are hence invisible to traditional Web “crawlers.” Recently, commercial Web sites have started to manually organize Web-accessible databases into Yahoo!-like hierarchical classification schemes. Here we introduce QProber, a modular system that automates this classification process by using a small number of query probes, generated by document classifiers. QProber can use a variety of types of classifiers to generate the probes. To classify a database, QProber does not retrieve or inspect any documents or pages from the database, but rather just exploits the number of matches that each query probe generates at the database in question. We have conducted an extensive experimental evaluation of QProber over collections of real documents, experimenting with different types of document classifiers and retrieval models. We have also tested our system with over one hundred Web-accessible databases. Our experiments show that our system has low overhead and achieves high classification accuracy across a variety of databases.	en
dc.description.sponsorship	NYU, Stern School of Business, IOMS Department, Center for Digital Economy Research	en
dc.format.extent	3624351 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en_US	en
dc.publisher	Information Systems	en
dc.relation.ispartofseries	CeDER-PP-2003-08	en
dc.subject	database classification	en
dc.subject	web databases	en
dc.subject	hidden web	en
dc.title	QProber: A System for Automatic Classification of Hidden-Web Databases	en
dc.type	Article	en
Appears in Collections:	CeDER Published Papers

Files in This Item:

File	Description	Size	Format
CeDER-PP-2003-08.pdf		3.54 MB	Adobe PDF	View/Open

Show simple item record