Afaan Oromo Hybrid Modelling: A Case based Optimized Intelligence in Information Retrieval System’s Localization

Main Article Content

Amin Tuni, Sisay Tumsa, Durga Prasad Sharma, BhupeshKumar Singh, Mario A. Bochicchio


The data and information resources available with variety and velocity over the Internet and the World Wide Web (WWW) are dramatically changing their dynamics every day. Due to such information overloading, the access or retrieval task of the desired data or information becomes complex.  This new phenomenon makes it difficult for the users to recognize and retrieve the relevant information which satisfies their needs. We are living in a world where diversity of the data and information in diverse languages have become a typical challenge. In the case of Ethiopia; a linguistic localization is paramount where an enormous amount of digital data and information resources in Afaan Oromo are being generated every day.  These phenomenon features of data and information resources are again challenged by archival and searching issues in these large pools of documents. The prime aim of this research study was to develop a hybrid information retrieval system for localized significance in Afaan Oromo so as to enable the users to search and retrieve their required and relevant data information efficiently.  Prior research studies in Afaan Oromo linguistic domain clearly indicate that the IR systems have yet not attained any promising attention for system performance. Also, several attempts made for Afaan Oromo IR (AOIR) system using a hybrid approach but their efficiency still needs a significant improvement. To solve such problems, this research study tried to integrate different types of approaches for the IR system towards the improvement of the performance of the AOIR system. The developed prototype has basic IR subsystems both for indexing and searching. For experimental analysis, 1000 Afaan Oromo text documents were collected from print media news articles (i.e. Oromia Broadcasting Network, VOA Afaan Oromo), the Afaan Oromo Bible, websites, books, and online News. The different text operations such as; tokenization, normalization, stop-word removal, and stemming were used to identify content bearing terms and vocabularies. The tfidf term weighting scheme was applied to compute the term weight in the documents. After the experimental analysis in Python 3.7, the average result achieved was 96.6% precision, 90.0% recall, and 93.3% F-measure respectively. The system performance was still found to be affected by the problem of polysemy. Therefore, the research recommends the additional work for improving the system performance so as to advance the AOIR using different techniques.

Article Details