Cover Image

Optical Character Recognition System for Sindhi Text: A Survey

Waseem Javaid Soomro, Dil Nawaz Hakro, Imdad Ali Ismaili, Ghulam Mustafa Shoro


Optical character recognition is popular field for researchers during last decade of research, which is able to successfully recognize the scanned English image into editable text form. However, optical character systems for other regional languages such as Urdu, Arabic, and Sindhi, still presents a huge challenge and implementation problems. Thus, in this paper various techniques of optical character recognition system for such low level regional languages have been discussed and analyzed. This survey paper consolidates all such techniques and presents an overview to aid researcher understand the methodology of performing and implementing OCR system for Sindhi language.

Full Text:



Hakro, D.N., Ismaili, I.A., Talib, A.Z., Bhatti, Z., & Mojai, G.N., (2014) Issues and Challenges in Sindhi OCR. Sindh University Research Journal (Science Series). Vol.46 (2). Pp. 143-152. Sindh University Press. June 2014.

Hakro, D.N., Talib, A.Z., Bhatti, Z., & Mojai, G.N., (2014) A Study of Sindhi Related and Arabic Script Adapted languages Recognition. Sindh University Research Journal (Science Series). Vol.46 (3). Pp. 323-334. Sindh University Press. October

Bhatti, Z., Waqas, A., Ismaili, I. A., Hakro, D. N., & Soomro, W. J. (2014). Phonetic based SoundEx & ShapeEx algorithm for Sindhi Spell Checker System. arXiv preprint arXiv:1405.3033.

Bhatti, Z., Ismaili, I. A., & Soomro, W. J. (2015). Phonetic-Based Sindhi Spellchecker System Using a Hybrid Model. Digital Scholarship in the Humanities, fqv005

C. Vasantha Lakshmi1 and C. Patvardhan “An optical character recognition system for printed Telugu text , Pattern Analysis & Applications”, Category, Theoretical Advances, Volume 7, Number 2 / July, 2004 Pages 190-204

C. Vasantha Lakshmi, C. Patvardhan, “A high accuracy OCR System for Printed Telugu Text”.TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region Volume 2, Issue , 15-17 Oct. 2003 Page(s): 725 - 729 Vol.2 Digital Object Identifier 10.1109/ TENCON. 2003.1273274

Hiroyuki Masai 1'2 and Toyohide Watanabe I, Document Categorization for Document Image Understanding, Department of Information Engineering, Graduate School of Engineering, Nagoya University Furo-cho, Chikusa-ku, Nagoya 464-01, Japan

Document Image Processing of Indian Scripts,. Special Issue of Sadhana 2002., this page was last modified on 1 November 2010 at 12:31, Mindmatrix (14,095 bytes).

Michael Decerbo, Ehry MacRostie, Premkumar Natarajan, (2004) “ The BBN Byblos Pashto OCR System”, HDP’04, November 12, 2004, Washington, DC, USA. Copyright 2004 ACM 1-58113-976-4/04/0011

Cavalin, P. R. (2006, April). An implicit segmentation-based method for recognition of handwritten strings of characters. In Proceedings of the 2006 ACM symposium on Applied computing (pp. 836-840). ACM.

Cavalin, P. R., Sabourin, R., & Suen, C. Y. (2010, March). Dynamic Selection of Ensembles of Classifiers Using Contextual Information. In MCS (Vol. 10, pp. 145-154).

Khorsheed MS, Clocksin WF, Spectral features for Arabic word recognition. The IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP’2000, Istanbul, Turkey, June 5-9 , 2000, pp,3574-3577.

Baecher, P., Büscher, N., Fischlin, M., & Milde, B. (2011). Breaking reCAPTCHA: a holistic approach via shape recognition. Future challenges in security and privacy for academia and industry, 56-67.

Bhattacharya, S., Sukthankar, R., & Shah, M. (2011). A holistic approach to aesthetic enhancement of photographs. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 7(1), 21.

Erlandson E, Trenkle J, Vogt R, Word level recognition of multifont Arabic text using a feature vector matching approach, Proceedings of International Society for Optical Engineers, SPIE, 1996; 2660: 63-70.

Tolba M, Shaddad E. On the automatic reading of printed Arabic characters. Proceedings of IEEE International Conference on Systems, Man and Cybernetics, Los Angeles, CA, 1990; 496-498.

Casey, R. G., & Lecolinet, E. (1996). A survey of methods and strategies in character segmentation. IEEE transactions on pattern analysis and machine intelligence, 18(7), 690-706.

Su, T. H., Zhang, T. W., Huang, H. J., & Zhou, Y. (2007, September). HMM-based recognizer with segmentation-free strategy for unconstrained Chinese handwritten text. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on (Vol. 1, pp. 133-137). IEEE.

Gatos, B., Ntzios, K., Pratikakis, I., Petridis, S., Konidaris, T., & Perantonis, S. J. (2006). An efficient segmentation-free approach to assist old Greek handwritten manuscript OCR. Pattern analysis and applications, 8(4), 305-320.

M.Salmani Jelodar, M.J.Fadaeieslam , N.Mozayani,M.Fazeli, (2005) “A Persian OCR System using Morphological Operators”, Transactions on Engineering, Computing and Technology v4 February 2005 ISSN 1305-5313.

Bouslama F, Kishibe H. Fuzzy logic in the recognition of printed Arabic text. IEEE Transactions on 1999: 1150-1154

Singh, R., Yadav, C. S., Verma, P., & Yadav, V. (2010). Optical character recognition (OCR) for printed devnagari script using artificial neural network. International Journal of Computer Science & Communication, 1(1), 91-95.

Pal, U., & Sarkar, A. (2003, August). Recognition of printed Urdu script. In Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on (pp. 1183-1187). IEEE.

Razzak, M. I., Anwar, F., Husain, S. A., Belaid, A., & Sher, M. (2010). HMM and fuzzy logic: a hybrid approach for online Urdu script-based languages’ character recognition. Knowledge-Based Systems, 23(8), 914-923.

Ahmad, Z., Orakzai, J. K., Shamsher, I., & Adnan, A. (2007, December). Urdu nastaleeq optical character recognition. In Proceedings of world academy of science, engineering and technology (Vol. 26, pp. 249-252).

Ul-Hasan, A., Ahmed, S. B., Rashid, F., Shafait, F., & Breuel, T. M. (2013, August). Offline printed Urdu Nastaleeq script recognition with bidirectional LSTM networks. In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on (pp. 1061-1065). IEEE.

Shah, Z. A. (2002, December). Ligature based optical character recognition of Urdu-Nastaleeq font. In Multi Topic Conference, 2002. Abstracts. INMIC 2002. International (pp. 25-25). IEEE.

Hamid, A.and Haraty, R., “A Neuro-Heuristice Approach for Segmenting Hand written Arabic Tex”, ACS/IEEE International Conference on Computer Systems and Applications, Beirut, Lebanon, 25-06-2001 – 29-06-2001, pp: 110-113.

Bazzi, I., Schwartz, R., & Makhoul, J. (1999). An omnifont open-vocabulary OCR system for English and Arabic. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(6), 495-504.

Cheung, A., Bennamoun, M., & Bergmann, N. W. (2001). An Arabic optical character recognition system using recognition-based segmentation. Pattern recognition, 34(2), 215-233.

Hamid, A.and Haraty, R., “A Neuro-Heuristice Approach for Segmenting Hand written Arabic Tex”, ACS/IEEE International Conference on Computer Systems and Applications, Beirut, Lebanon, 25-06-2001 – 29-06-2001, pp: 110-113.

C. Vasantha Lakshmi1 and C. Patvardhan “ An optical character recognition system for printed Telugu text , Pattern Analysis & Applications”, Category, Theoretical Advances, Volume 7, Number 2 / July, 2004 Pages 190-204

B B CHAUDHURI, U PAL and M MITRA,“Automatic recognition of printed Oriya script”.Sadhana Vol.27, Part 1,February 2002,pp.23-34.(c) Printed in India

B.M. Sagar, Dr. Shobha G, Dr. Ramakanth Kumar P."OCR for printed Kannada text to Machine editable format using Database approach ", WSEAS TRANSACTIONS on COMPUTERS, Issue 6, Volume 7, June 2008, ISSN: 1109-2750.

LI Guo-hong (李国宏)†, SHI Peng-fei (施鹏飞) “An approach to offline handwritten Chinese character recognition based on segment evaluation of adaptive duration”.Journal oa Zhejiang University SCIENCE, ISSN 1009-3095, Li et al/ J Zhejiang Univ SCI 2004 5(11):1392-1397.


  • There are currently no refbacks.

Copyright (c) 2018 University of Sindh Journal of Information and Communication Technology

ISSN-E: 2523-1235, ISSN-P: 2521-5582

 Copyright © University of Sindh, Jamshoro. 2017 All Rights Reserved.  
Printing and Publication by: Sindh University Press. 

Journal Office, Institute of Information and Communication Technology, 
University of Sindh, Jamshoro, Sindh, Pakistan. 76080