Skip navigation
Title: 

Automated Transcription of Non-Latin Script Periodicals: A Case Study in the Ottoman Turkish Print Archive

Authors: Kirmizilatin, Suphan
Wrisley, David Joseph
Keywords: Transkribus;Handwritten Text Recognition (HTR);Ottoman Turkish;Optical Character Recognition (OCR);Right to Left Scripts
Issue Date: 2022
Citation: Kirmizialtin, Suphan and David Joseph Wrisley. (2022). Automated Transcription of Non-Latin Script Periodicals: A Case Study in the Ottoman Turkish Print Archive. DHQ 16.2. http://www.digitalhumanities.org/dhq/vol/16/2/000577/000577.html
Abstract: Our study discusses the automated transcription with deep learning methods of a digital newspaper collection printed in a historical language, Arabic-script Ottoman Turkish (OT), dating to the late nineteenth- and early twentieth-century. We situate OT text collections within a larger history of digitization of periodicals, underscoring special challenges faced by Arabic script languages. Our paper approaches the question of automated transcription of non-Latin script languages, such as OT, from the broader perspective of debates surrounding OCR use for historical archives. In our study with OT, we have opted for training handwritten text recognition (HTR) models that generate transcriptions in the left-to-right, Latin writing system familiar to contemporary readers of Turkish, and not, as some scholars may expect, in right-to-left Arabic script text. As a one-to-one correspondence between the writing systems of OT and modern Turkish does not exist, we also discuss approaches to transcription and the creation of ground truth and argue that the challenges faced in the training of HTR models also draw into question straightforward notions of transcription, especially where divergent writing systems are involved. Finally, we reflect on potential domain bias of HTR models in other historical languages exhibiting spatio-temporal variance as well as the significance of working between writing systems for language communities that also have experienced language reform and script change.
URI: http://www.digitalhumanities.org/dhq/vol/16/2/000577/000577.html
http://hdl.handle.net/2451/63909
Rights: Creative Commons Attribution-NoDerivatives 4.0 International License
Appears in Collections:David Wrisley's Collection

Files in This Item:
File Description SizeFormat 
Kirmizialtin_Wrisley_AutomatedTranscription.pdfOT_Transkribus1.51 MBAdobe PDFView/Open


Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.