Skip navigation
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKirmizilatin, Suphan-
dc.contributor.authorWrisley, David Joseph-
dc.identifier.citationKirmizialtin, Suphan and David Joseph Wrisley. (2022). Automated Transcription of Non-Latin Script Periodicals: A Case Study in the Ottoman Turkish Print Archive. DHQ 16.2.
dc.description.abstractOur study discusses the automated transcription with deep learning methods of a digital newspaper collection printed in a historical language, Arabic-script Ottoman Turkish (OT), dating to the late nineteenth- and early twentieth-century. We situate OT text collections within a larger history of digitization of periodicals, underscoring special challenges faced by Arabic script languages. Our paper approaches the question of automated transcription of non-Latin script languages, such as OT, from the broader perspective of debates surrounding OCR use for historical archives. In our study with OT, we have opted for training handwritten text recognition (HTR) models that generate transcriptions in the left-to-right, Latin writing system familiar to contemporary readers of Turkish, and not, as some scholars may expect, in right-to-left Arabic script text. As a one-to-one correspondence between the writing systems of OT and modern Turkish does not exist, we also discuss approaches to transcription and the creation of ground truth and argue that the challenges faced in the training of HTR models also draw into question straightforward notions of transcription, especially where divergent writing systems are involved. Finally, we reflect on potential domain bias of HTR models in other historical languages exhibiting spatio-temporal variance as well as the significance of working between writing systems for language communities that also have experienced language reform and script change.en
dc.rightsCreative Commons Attribution-NoDerivatives 4.0 International Licenseen
dc.subjectHandwritten Text Recognition (HTR)en
dc.subjectOttoman Turkishen
dc.subjectOptical Character Recognition (OCR)en
dc.subjectRight to Left Scriptsen
dc.titleAutomated Transcription of Non-Latin Script Periodicals: A Case Study in the Ottoman Turkish Print Archiveen
Appears in Collections:David Wrisley's Collection

Files in This Item:
File Description SizeFormat 
Kirmizialtin_Wrisley_AutomatedTranscription.pdfOT_Transkribus1.51 MBAdobe PDFView/Open

Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.