PKP International Scholarly Publishing Conferences, PKP Scholarly Publishing Conference 2013

Font Size: 
The new OJS search features with Lucene/Solr
Bozana Bokan

Building: Amoxcalli Buildings (Science Department)
Room: Anfiteatro Alfredo Barreda
Date: 2013-08-21 10:20 AM – 11:20 AM
Last modified: 2013-06-20

Abstract


This presentation delivers insight into the new OJS search features. The search is based on the enterprise search server “solr” and the underlying “Lucene” search framework 1. It was implemented as part of the project "Functional extensions and value added services for OJS", founded by the Deutsche Forschungsgemeinschaft (DFG).

OJS provides its own search across article metadata and full texts. It uses a simple text analysis/pre-processing for indexing and search - the text is split up by white space (“tokenization”), the words are converted to lowercase and common words (“stopwords”) removed from the token stream. This yields good search results for languages like German or English, but it doesn’t work for languages that use logographic notational systems like Japanese or Chinese.

Lucene/Solr provides more sophisticated tokenizers, token filters, as well as language-specific analysis components “out of the box”. It also supports many other advanced search features, like auto-suggestion, alternative spelling proposal, paging, highlighting, ordering, faceting, “more like this” feature, improved ranking, etc. The current integration of Lucene/Solr solves not only the problem of the multilingual search support, but brings also a faster indexing and the additional search features to OJS.

Lucene/Solr and some of the many search functionalities it offers will be introduced here. The installation steps, the decisions to be made, and the configuration necessary for the use in OJS will be explained. The new search functionality and features in OJS will be demonstrated and the important maintenance operations shown.

Footnote

1  S. http://lucene.apache.org/ i.e. http://lucene.apache.org/solr/


Full Text: PDF