Skip navigation
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKapan, Almazhan-
dc.contributor.authorKirmizialtin, Suphan-
dc.contributor.authorKukreja, Rhythm-
dc.contributor.authorWrisley, David Joseph-
dc.date.accessioned2022-10-07T09:37:17Z-
dc.date.available2022-10-07T09:37:17Z-
dc.date.issued2022-10-06-
dc.identifier.citationKapan, Almazhan, Suphan Kirmizialtin, Rhythm Kukreja and David Joseph Wrisley. (2022). Fine-Tuning NER with spaCy for Transliterated Entities Found in Digital Collections From the Multilingual Persian Gulf. Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022) Uppsala, Sweden, March 15-18, 2022. Eds. Karl Berglund, Matti La Mela and Inge Zwart. 288-296.en
dc.identifier.issn1613-0073-
dc.identifier.urihttp://ceur-ws.org/Vol-3232/-
dc.identifier.urihttp://hdl.handle.net/2451/63943-
dc.description.abstractText recognition technologies increase access to global archives and make possible their computational study using techniques such as Named Entity Recognition (NER). In this paper, we present an approach to extracting a variety of named entities (NE) in unstructured historical datasets from open digital collections dealing with a space of informal British empire: the Persian Gulf region. The sources are largely concerned with people, places and tribes as well as economic and diplomatic transactions in the region. Since models in state-of-the-art NER systems function with limited tag sets and are generally trained on English-language media, they struggle to capture entities of interest to the historian and do not perform well with entities transliterated from other languages. We build custom spaCy-based NER models trained on domain-specific annotated datasets. We also extend the set of named entity labels provided by spaCy and focus on detecting entities of non-Western origin, particularly from Arabic and Farsi. We test and compare performance of the blank, pre-trained and merged spaCy-based models, suggesting further improvements. Our study makes an intervention into thinking beyond Western notions of the entity in digital historical research by creating more inclusive models using non-metropolitan corpora in English.en
dc.language.isoen_USen
dc.publisherCEUR Workshop Proceedingsen
dc.rightsCreative Commons License Attribution 4.0 International (CC BY 4.0)en
dc.subjectNamed Entity Recognition, Gulf Studies, Colonial Archives, Persian Gulf, spaCy, Transliterated Namesen
dc.titleFine-Tuning NER with spaCy for Transliterated Entities Found in Digital Collections From the Multilingual Persian Gulfen
dc.typeArticleen
Appears in Collections:David Wrisley's Collection

Files in This Item:
File Description SizeFormat 
Kapanetal_FineTuningNER.pdfDHNB_2022342.52 kBAdobe PDFView/Open


Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.