|
Archive@NYU >
Stern School of Business >
CeDER Working Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/28302
|
| Title: | Relevance-based Retrieval on Hidden-Web Text Databases without Ranking Support |
| Authors: | Hristidis, Vagelis Hu, Yuheng Ipeirotis, Panagiotis G. |
| Issue Date: | 21-Sep-2009 |
| Series/Report no.: | CeDER-09-05 |
| Abstract: | Many online or local data sources provide powerful querying mechanisms
but limited ranking capabilities. For instance, PubMed allows users to
submit highly expressive Boolean keyword queries, but ranks the query
results by date only. However, a user would typically prefer a ranking
by relevance, measured by an Information Retrieval (IR) ranking
function. The naive approach would be to submit a disjunctive query with
all query keywords, retrieve the returned documents, and then re-rank
them. Unfortunately, such an operation would be very expensive due to
the large number of results returned by disjunctive queries. In this
paper we present algorithms that return the top results for a query,
ranked according to an IR-style ranking function, while operating on top
of a source with a Boolean query interface with no ranking capabilities
(or a ranking capability of no interest to the end user). The algorithms
generate a series of conjunctive queries that return only documents that
are candidates for being highly ranked according to a relevance metric.
Our approach can also be applied to other settings where the ranking is
monotonic on a set of factors (query keywords in IR) and the source
query interface is a Boolean expression of these factors. Our
comprehensive experimental evaluation on the PubMed database and a TREC
dataset show that we achieve order of magnitude improvement compared to
the current baseline approaches. |
| URI: | http://hdl.handle.net/2451/28302 |
| Appears in Collections: | CeDER Working Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|