Skip navigation
Full metadata record
DC FieldValueLanguage
dc.contributor.authorRomein, C. Annemieke-
dc.contributor.authorHodel, Tobias-
dc.contributor.authorGordijn, Femke-
dc.contributor.authorvan Zundert, Joris-
dc.contributor.authorChagué, Alix-
dc.contributor.authoret al-
dc.contributor.authorWrisley, David Joseph-
dc.date.accessioned2024-10-29T10:42:32Z-
dc.date.available2024-10-29T10:42:32Z-
dc.date.issued2024-03-18-
dc.identifier.citationRomein, C. A., Hodel, T., Gordijn, F., Zundert, J. J. van, Chagué, A., Lange, M. van, Jensen, H. S., Stauder, A., Purcell, J., Terras, M. M., Heuvel, P. van den, Keijzer, C., Rabus, A., Sitaram, C., Bhatia, A., Depuydt, K., Afolabi-Adeolu, M. A., Anikina, A., Bastianello, E., … Zweistra, R. (2024). Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done. Zenodo. 10.5281/ZENODO.10804745en
dc.identifier.other10.5281/ZENODO.10804745-
dc.identifier.urihttps://jdmdh.episciences.org/13242-
dc.identifier.urihttp://hdl.handle.net/2451/74651-
dc.description.abstractThis paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, and ways to reference and acknowledge contributions to the creation and enrichment of data within these Machine Learning systems. We discuss how one can publish Ground Truth data in a repository and, subsequently, inform others. Furthermore, we suggest appropriate citation methods for HTR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of Machine Learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance.en
dc.language.isoen_USen
dc.subjectAutomatic Text Recognition, Handwritten Text Recognition, Data Publication, Open Data, Data Curation, Ground Truth, Sharingen
dc.titleExploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Doneen
dc.typeArticleen
Appears in Collections:David Wrisley's Collection

Files in This Item:
File Description SizeFormat 
Exploring_Data_Provenance_11-3.pdfJDHDM_dataprovenceHTR3.92 MBAdobe PDFView/Open


Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.