Statistical Approaches to Instant Diacritics Restoration for Sindhi Accent Prediction

H. SHAIKH, J. A. MAHAR, M. H. MAHAR

Abstract


Sindhi script highly abounds in the homographic words which lead the reader and machine to many complexities. Due to the possibility of several meanings of one homographic structure, the interpretation and understanding of the text becomes severely difficult. Before the interpretation, pronunciation varies which is the leading cause to the complexity. Diacritics help us remove such complexities and comprehend the text easily and accurately. Due to the time saving nature of the people of current era, they don’t bother to write diacritics in routine writings. Apart from the difficulties in reading for human beings, the absence of diacritics creates difficulty for machine reading as well. The text prediction systems produced the basis for the instant diacritics restoration approach. This instant system of diacritics restoration is an entirely novel and unique work in the field of natural language processing. A framework of N-Grams and Maximum Entropy is proposed in this research work. The highest attention catching point of this system using unigram, bigram, trigram and quad-gram is 98.98% accuracy on the corpus of Sindhi language. The super edge of instant diacritics restoration is to be leading initiative to the highly advancing performance of other natural language and speech processing applications.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Copyright (c) 2017 Sindh University Research Journal - SURJ (Science Series)

 Copyright © University of Sindh, Jamshoro. 2017 All Rights Reserved.
Printing and Publication by: Sindh University Press.