PKP International Scholarly Publishing Conferences, PKP Scholarly Publishing Conference 2013

Font Size: 
Towards an Open Access Dataset for Alternative Impact Metrics: Notes from the Digging into Connected Repositories (DiggiCORE) project
Petr Knoth, Zdenek Zdrahal

Building: Amoxcalli Buildings (Science Department)
Room: Anfiteatro Alfredo Barreda
Date: 2013-08-21 05:00 PM – 06:05 PM
Last modified: 2013-06-20

Abstract


There are many approaches to measuring the impact of publications, journals and the productivity of academics (Garfield, 2006). They are typically based only on citations, which has many advantages but also significant shortcomings. For example, the criteria used to evaluate impact do not take into account the real content of the publication; the relationship between the publication and the cited literature rests on the author. Moreover, such impact calculations involve a time delay and can also be artificially manipulated. The recent developments in information technologies and the successful trend in making research outputs Open Access make it possible to advance beyond the current state of the art, by further developing new alternative metrics [Priem et. al, 2012]. To do this effectively, it is crucial to ease the access to relevant data available on the Internet, currently spread across thousands of systems.  

The goal of the DiggiCORE (Digging into Connected REpositories) project, funded in the Digging into Data Challenge by JISC/AHRC/ESRC and NWO, is to aggregate, at the level of both metadata and content,  a vast set of research publications, from institutional repositories and archives (green OA route) and journals (gold OA route), and provide novel tools for automatic enrichment of this content with relationships. The relationships should be used in turn to generate and publicly expose large and openly available networks of Open Access publications. These networks together with the actual full-text content can be then analysed using natural language processing and social network analysis methods to identify patterns in the behaviour of research communities, to recognise trends in research disciplines, to learn new insights about the citation behaviours of researchers, to discover new features that distinguish papers with high impact, etc.

To enable the analysis, the DiggiCORE project develops a software infrastructure, building on the results of the CORE system [Knoth & Zdrahal, 2012], which provides access to Open Access research outputs acquired by harvesting, cleaning, integrating and processing information from a very large and fast-growing collection of millions of research publications. The DiggiCORE project builds tools that enable access to the raw textual content intended for machine processing and the extracted and generated networks (citation network, article relatedness, author citation network) to the public via a set of web services and also as a downloadable dataset, thus creating a single access point for Open Access research outputs.  

The availability of a single integrated dataset of OA research outputs and the ability of anybody to mine them has can potentially significantly influence various disciplines. For example, it will allow researchers to run experiments that can enable the development of better methods for exploratory search and browsing in digital collections or new ways of evaluating research or the researcher’s impact. We believe the availability of these datasets can also facilitate the transition to Open Access (by demonstrating the advantages of a uniform and free access to this  huge distributed dataset) and can also help carry out experiments to find new impact metrics to improve scholarly communication.


Keywords


Open Access (OA), impact, digital libraries, text mining

References


[Garfield, 2006] Eugene Garfield. The history and meaning of the journal impact factor. Journal of the American Medical Association, 2006.

[Priem et. Al, 2012] Jason Priem, Paul Groth, Dario Taraborelli. The Altmetrics Collection, PLOS ONE: published 01 Nov 2012 | info:doi/10.1371/journal.pone.0048753

[Knoth & Zdrahal, 2012] Knoth, P. and Zdrahal, Z. (2012) CORE: Three Access Levels to Underpin Open Access, D-Lib Magazine, 18, 11/12, Corporation for National Research Initiatives

Full Text: Presentation