Faculty Digital Archive

Archive@NYU >
Stern School of Business >
CeDER Published Papers >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2451/27823

Title: Duplicate Record Detection: A Survey
Authors: Elmagarmid, Ahmed
Panagiotis, Ipeirotis
Verykios, Vassilios
Keywords: duplicate detection
data cleaning
data integration
record linkage
instance identification
database hardening
name matching
identity uncertainty
entity resolution
fuzzy duplicate detection
entity matching
Issue Date: Jan-2007
Publisher: IEEE
Citation: IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 19, no. 1, January 2007
Series/Report no.: CeDER-PP-2007-15
Abstract: Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are introduced as the result of transcription errors, incomplete information, lack of standard formats, or any combination of these factors. In this paper, we present a thorough analysis of the literature on duplicate record detection. We cover similarity metrics that are commonly used to detect similar field entries, and we present an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database. We also cover multiple techniques for improving the efficiency and scalability of approximate duplicate detection algorithms. We conclude with coverage of existing tools and with a brief discussion of the big open problems in the area.
URI: http://hdl.handle.net/2451/27823
Appears in Collections:CeDER Published Papers

Files in This Item:

File Description SizeFormat
CeDER-PP-2007-15.pdf350.32 kBAdobe PDFView/Open

Items in Faculty Digital Archive are protected by copyright, with all rights reserved, unless otherwise indicated.

 

The contents of the FDA may be subject to copyright, be offered under a Creative Commons license, or be in the public domain.
Please check items for rights statements. For information about NYU’s copyright policy, see http://www.nyu.edu/footer/copyright-and-fair-use.html 
Valid XHTML 1.0 | CSS