Building: Amoxcalli Buildings (Science Department)
Room: Anfiteatro Alfredo Barreda
Date: 2013-08-21 10:20 AM – 11:20 AM
Last modified: 2013-06-20
Abstract
One key to the transformation of content creation, consumption, and curation in the digital era, especially for text-intense scholarly work, is an ability to readily manipulate the form of such texts for publishing and reading across multiple devices and for indexing it elements for more efficient information retrieval and utilization. The leading technology for achieving this structural flexibility is marking up or tagging the text with XML (Extensible Markup Language). However, the tagging of texts currently requires expensive XML editing tools and the application of skilled labor as the texts are essentially marked up by hand. As a result only a minority of journals utilize XML in a systematic way, and in narrow ways, while much more could be done with a well-tagged text: automated re-flowing (or re-editing) of the text to fit different contexts and devices, seamless linking of inline references, and better semantic indexing, to name a few.
To date, OJS has had no special functionality for working with document markup; while it provides a very robust workflow for authors, reviews, and editors to take the requisite turns in producing scholarly material, it is effectively blind to the materials’ form and content. This work has sought to remedy this by providing an automated XML conversion pipeline, using the NLM standard to produce parallel HTML and PDF copies for online or offline consumption, by indexing robots and readers.
Currently, we have a standalone service which produces a 75% desirable result on article body text and a 95% desirable result on article references. We also have a functional, mediated interaction between Open Journal Systems' workflow and our parsing service. We have assisted with the preparation of a grant at the University of Heidelberg which, if successful, will provide funding for the development of a WYSIWYG XML editing system which can be seamlessly invoked at the end of our automated pipeline, both minimizing the time spent on manual editing and allowing the result of our automated system to receive the human attention which will likely be necessary to turn a 95% satisfactory result into production-quality output.
We have successfully implemented rudimentary copyediting functionality which checks references listed at the end of the article against those used in the article body and vice versa, and we have an elegant find-as-you-type solution for selecting a preferred citation style from the Citation Style Language repository, which ensures that we can cater to virtually any available journal guidelines without needing to invest any future effort into implementing new styles or style revisions.
Work on this project is ongoing, separately from the main PKP team, thanks to funding from Stanford University’s MediaX. We continue to explore sustainable funding models for providing this functionality as a web service