Skip navigation
Title: 

Exploring Gulf Manumission Documents with Word Vectors

Authors: Kirmizialtin, Suphan
Wrisley, David Joseph
Keywords: Handwritten Text Recognition (HTR);word vector models (WVM);India Office Records (IOR);manumission;Gulf Studies;colonial archives;slavery
Issue Date: 27-Dec-2024
Publisher: Brill
Citation: Kirmizialtin, S. and D.J. Wrisley. (2024). Exploring Gulf Manumission Documents with Word Vectors. Journal of Digital Islamicate Research 2: 1-29.
Abstract: In this article we analyze a corpus related to manumission and slavery in the Arabian Gulf in the late nineteenth- and early twentieth-century that we created using Handwritten Text Recognition (HTR). The corpus comes from India Office Records (IOR) R/15/1/199 File 5. Spanning the period from the 1890s to the early 1940s and composed of 977K words, it contains a variety of perspectives on manumission and slavery in the region from manumission requests to administrative documents relevant to colonial approaches to the institution of slavery. We use word2Vec with the WordVectors package in R to highlight how the method can uncover semantic relationships within historical texts, demonstrating some exploratory semantic queries, investigation of word analogies, and vector operations using the corpus content. We argue that advances in applied computer vision such as HTR are promising for historians working in colonial archives and that while our method is reproducible, there are still issues related to language representation and limitations of scale within smaller datasets. Even though HTR corpus creation is labor intensive, word vector analysis remains a powerful tool of computational analysis for corpora where HTR error is present.
URI: http://hdl.handle.net/2451/74850
DOI: doi:10.1163/27732363-bja00005
Rights: CC BY 4.0 Open Access
Appears in Collections:David Wrisley's Collection

Files in This Item:
File Description SizeFormat 
27732363_002_01-02_s001_text.pdfGulf Manumission Documents1.18 MBAdobe PDFView/Open


Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.