PKP International Scholarly Publishing Conferences, PKP Scholarly Publishing Conference 2015

Font Size: 
Towards an experimental framework for evaluating the quality of crowdsourced manuscript transcriptions
Anne Vikhrova

Last modified: 2015-08-03


Current editorial practices for manuscript transcription rely on the work of specialists. These practices permit the conservation of precious documents and preparation for publishing scholarly editions of historical and literary works. No matter how difficult the text may be, the resulting transcriptions demonstrate a level of quality that is equivalent to the original document. This process requires a high level of competence to produce a corpus, and more importantly it requires time. The fields of natural language processing (NLP) and computer science for digital humanities (DH) propose several solutions to improve current work methods. One of them relies on character recognition (OCR), but unfortunately this method remains problematic when treatment of complex documents is concerned. We are exploring another approach, one that is present in current social media network practices, commercial campaigns and even non-commercial enterprises. It is based on the principle of crowdsourcing transcriptions of manuscripts by inviting contributions from non-specialists.

The proposed analysis will develop an experimental framework that will allow to compare the different existing methods, determine their efficacy for speeding up the digitization process and evaluate the quality of resulting documents compared to more traditional, expert-reliant editorial chains. Samples used in this study will be categorized according to their level of difficulty. Transcriptions will be performed by groups of experts and non-experts. Then a comparison between the resulting documents will be performed. Concluding statements will show the potential interest for contemporary digital editorial practices and document conservation.


crowdsourcing; digital publishing; evaluation; manuscripts

Full Text: Prezi  |  Video