Stemming Software Application of Shahmukhi Script Using Porter’s Algorithm

M. IRFAN, J. A. MAHAR, H. SHAIKH, F. A. SURAHIO

Abstract


Today, linguistics problems enforce to computer industry to develop and design such application that could give strength to eliminate complexity of languages keeping more than one meaning of the same words. A Couple of approaches originated and provided better mechanism and accuracy for English language. In, Pakistan, various languages exist and spoken by People living different provinces of Pakistan. Sindhi, Urdu, Balochi and Punjabi are the most common languages. Each language represents different meaning of the same word besides diacritical complexities and morphemes problems too. The morphological exceptions increases day by day in script natural languages and plenty algorithms have been introduced, stemmer is one of them. In this paper, Punjabi language is selected to identify its morphological issues on prefix, suffix and prefix-suffix words. For this, porter stemming algorithm has been chosen for getting results on developed corpus of 23962 words. Moreover, prefix words are calculated along 5.64% stemmer error rate (SER) and 1.47% with suffix words. Thus, from prefix-suffix words 3.92% are calculated. However, the entire accumulative 11.03% of SER is recorded on Punjabi language. The developed stemmer could be fruitful for programmers and another step forward to field of natural language processing projects.


Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Copyright (c) 2017 Sindh University Research Journal - SURJ (Science Series)

 Copyright © University of Sindh, Jamshoro. 2017 All Rights Reserved.
Printing and Publication by: Sindh University Press.