Relevance-based Retrieval on Hidden-Web Text Databases without Ranking Support

Hristidis, Vagelis; Hu, Yuheng; Ipeirotis, Panagiotis G.

Full metadata record

DC Field	Value	Language
dc.contributor.author	Hristidis, Vagelis	-
dc.contributor.author	Hu, Yuheng	-
dc.contributor.author	Ipeirotis, Panagiotis G.	-
dc.date.accessioned	2009-09-21T21:06:11Z	-
dc.date.available	2009-09-21T21:06:11Z	-
dc.date.issued	2009-09-21T21:06:11Z	-
dc.identifier.uri	http://hdl.handle.net/2451/28302	-
dc.description.abstract	Many online or local data sources provide powerful querying mechanisms but limited ranking capabilities. For instance, PubMed allows users to submit highly expressive Boolean keyword queries, but ranks the query results by date only. However, a user would typically prefer a ranking by relevance, measured by an Information Retrieval (IR) ranking function. The naive approach would be to submit a disjunctive query with all query keywords, retrieve the returned documents, and then re-rank them. Unfortunately, such an operation would be very expensive due to the large number of results returned by disjunctive queries. In this paper we present algorithms that return the top results for a query, ranked according to an IR-style ranking function, while operating on top of a source with a Boolean query interface with no ranking capabilities (or a ranking capability of no interest to the end user). The algorithms generate a series of conjunctive queries that return only documents that are candidates for being highly ranked according to a relevance metric. Our approach can also be applied to other settings where the ranking is monotonic on a set of factors (query keywords in IR) and the source query interface is a Boolean expression of these factors. Our comprehensive experimental evaluation on the PubMed database and a TREC dataset show that we achieve order of magnitude improvement compared to the current baseline approaches.	en
dc.description.sponsorship	Vagelis Hristidis was partly supported by NSF grant IIS-0811922 and DHS grant 2009-ST-062-000016. Panagiotis G.\ Ipeirotis was supported by the National Science Foundation under Grant No. IIS-0643846.	en
dc.format.extent	923947 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en_US	en
dc.relation.ispartofseries	CeDER-09-05	en
dc.title	Relevance-based Retrieval on Hidden-Web Text Databases without Ranking Support	en
Appears in Collections:	CeDER Working Papers

Files in This Item:

File	Description	Size	Format
paper.pdf		795.91 kB	Adobe PDF	View/Open

Show simple item record