Computational Modelling of an Optical Character Recognition System for Yorùbá Printed Text Images

Óní, Ọlálékan; Asahiah, Franklin

Computational Modelling of an Optical Character Recognition System for Yorùbá Printed Text Images

dc.contributor.author	Óní, Ọlálékan
dc.contributor.author	Asahiah, Franklin
dc.date.accessioned	2023-06-10T21:00:57Z
dc.date.available	2023-06-10T21:00:57Z
dc.date.issued	2020-07
dc.description	Scientific African Volume 9, September 2020, e00415	en_US
dc.description.abstract	This study acquired a dataset of scanned images of Standard Yorùbá printed text and formulated a Yorùbá character image recognition model. The model formulated was implemented and the performance of the model evaluated to develop an Optical Character Recognition (OCR) model for Yorùbá printed text images. The image dataset at 300 dots per inches (dpi) was acquired by generating image text-line from Yorùbá New Testament Bible (Bibeli Mimo) corpus using Unicode UTF8. The Long Short Term Memory (LSTM) model, a variant of Recurrent Neural Network (RNN) was used to formulate the Standard Yorùbá character image recognition model. The Python OCRopus framework was used to implement the model designed. The performance of the model designed was evaluated using character error rate based on Levenshtein Edit Distance algorithm. The results show that the Character Error Rate (CER) of 3.138% for the font Times New Roman which gives better recognition than the other font style metric performance. The model achieved an OCR result of (7.435% CER) DejaVuSans font style image dataset, while for Ariel font image dataset, a result of 15.141% was achieved. The introduction of Language model-based Standard Yorùbá a spell-checker corrector show a reduction in the Character Error Rate. The Times New Roman font recorded an error rate of 1.182%, the DejaVuSans font style at an error rate of 4.098% while the Ariel font at 5.87%. The study concluded that the performance of the model shows that the farther away an image text font is from the font(s) used in training the network, the higher the character error rate of the recognition and that the inclusion of a post-processing stage shows a reduction in the Character Error Rates.	en_US
dc.description.sponsorship	ACE: ICT-Driven Knowledge Park	en_US
dc.identifier.citation	ONI, O. J., & ASAHIAH, F. O. (2020). Computational modelling of an optical character recognition system for Yorùbá printed text images. Scientific African, 9, e00415.	en_US
dc.identifier.issn	2468-2276
dc.identifier.uri	10.1016/j.sciaf.2020.e00415
dc.identifier.uri	http://hdl.handle.net/123456789/1973
dc.language.iso	en	en_US
dc.publisher	Elsevier	en_US
dc.subject	STEM	en_US
dc.subject	Obafemi Awolowo University	en_US
dc.subject	Optical character	en_US
dc.subject	Yorùbá	en_US
dc.subject	Orthography	en_US
dc.subject	Computational modelling	en_US
dc.subject	Spell-Check	en_US
dc.subject	correction	en_US
dc.subject	OCRopus	en_US
dc.title	Computational Modelling of an Optical Character Recognition System for Yorùbá Printed Text Images	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Computational_modelling_of_an_optical_character_recognition_system.pdf
Size:: 6.36 MB
Format:: Adobe Portable Document Format
Description:: Main Article

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

STEM