|
Archive@NYU >
Stern School of Business >
IOMS: Information Systems Working Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/14760
|
| Title: | Duplicate Record Detection: A Survey |
| Authors: | Elmagarmid, Ahmed Ipeirotis, Panagiotis G. Verykios, Vassilios |
| Keywords: | entity resolution duplicate detection record matching record linkage instance identification deduplication merge-purge coreference resolution database hardening |
| Issue Date: | 6-Sep-2006 |
| Publisher: | Stern School of Business, New York University |
| Series/Report no.: | CeDER-06-05 |
| Abstract: | Often, in the real world, entities have two or more representations in databases.
Duplicate records do not share a common key and/or they contain errors that make
duplicate matching a difficult task. Errors are introduced as the result of transcription
errors, incomplete information, lack of standard formats or any combination of these
factors. In this article, we present a thorough analysis of the literature on duplicate
record detection. We cover similarity metrics that are commonly used to detect similar
field entries, and we present an extensive set of duplicate detection algorithms that
can detect approximately duplicate records in a database. We also cover multiple techniques for improving the efficiency and scalability of approximate duplicate detection
algorithms. We conclude with a coverage of existing tools and with a brief discussion
of the big open problems in the area. |
| URI: | http://hdl.handle.net/2451/14760 |
| Appears in Collections: | CeDER Working Papers IOMS: Information Systems Working Papers
|
Items in Faculty Digital Archive are protected by copyright, with all rights reserved, unless otherwise indicated.
|