PKP International Scholarly Publishing Conferences, PKP Scholarly Publishing Conference 2013

Font Size: 
Fair use: a case study in normalizing journal usage data
Andrea Kosavic, Ling He

Building: Amoxcalli Buildings (Science Department)
Room: Anfiteatro Alfredo Barreda
Date: 2013-08-21 05:00 PM – 06:05 PM
Last modified: 2013-06-20

Abstract


As academic libraries become more involved in formally supporting and shaping scholarly communications through journal publishing and hosting programs, the ability to provide reliable metrics to journal editors emerges as an imperative. The importance of statistical data is even more pronounced in the case of open access journals, where traditional measures of a journal’s uptake, such as the counting of subscribers, are often not available. Government funding programs such as Canada’s Aid to Scholarly Journals Program are adjusting their criteria to accommodate electronic journals by looking at metrics such as unique site visitors and geographic distribution of readers. Journal hosting and publishing services must stay nimble to ensure the provision of adequate and accurate statistics data to their stakeholders.

York Digital Journals (YDJ) is an online journal hosting service provided by York University Libraries for York affiliated journals and hosts over 30 titles. YDJ makes use of the Open Journal Systems (OJS) software platform, an open source journal management and publishing system which was developed by the Public Knowledge Project and is widely in use by academic journals worldwide.

YDJ administrators use multiple methods to report statistics. This session focuses on two of these methods: a PHP script that extracts data directly from the YDJ database, and the COUNTER plugin as made available through Open Journal Systems. Despite rapid growth and increased uptake to the hosting service, sitewide article downloads and abstract views have spiked at intervals without logical explanation.  Both consistently depicted an overall downtrend from 2007 to 2012.

This presentation describes an attempt on behalf of the authors to investigate and normalize observed discrepancies in journal usage data with a focus on article downloads. It will describe the data analysis and strategy that were used to develop and test a series of scripts to filter and mine the web server access logs. These scripts were tested and refined against the YDJ data as well as other sources to improve accuracy. The identification and analysis of data anomalies and the adjustments made to the scripts to account for these challenges will also be discussed.

Finally, the unparsed YDJ data will be contrasted to the parsed YDJ data, showing a more normalized and expected usage curve. In light of these findings, this session will discuss the implications these variances of data have at a local and more broader scale, and will bring forward recommendations for journal publishers and hosting services charged with collecting and managing statistical usage data.

Keywords


Web usage mining, Web robots detection, Web log analysis, Open Journal Systems, data mining, electronic journals, best practices

Full Text: Presentation