Computational Modelling of an Optical Character Recognition System for Yorùbá Printed Text Images

dc.contributor.authorÓní, Ọlálékan
dc.contributor.authorAsahiah, Franklin
dc.date.accessioned2023-06-10T21:00:57Z
dc.date.available2023-06-10T21:00:57Z
dc.date.issued2020-07
dc.descriptionScientific African Volume 9, September 2020, e00415en_US
dc.description.abstractThis study acquired a dataset of scanned images of Standard Yorùbá printed text and formulated a Yorùbá character image recognition model. The model formulated was implemented and the performance of the model evaluated to develop an Optical Character Recognition (OCR) model for Yorùbá printed text images. The image dataset at 300 dots per inches (dpi) was acquired by generating image text-line from Yorùbá New Testament Bible (Bibeli Mimo) corpus using Unicode UTF8. The Long Short Term Memory (LSTM) model, a variant of Recurrent Neural Network (RNN) was used to formulate the Standard Yorùbá character image recognition model. The Python OCRopus framework was used to implement the model designed. The performance of the model designed was evaluated using character error rate based on Levenshtein Edit Distance algorithm. The results show that the Character Error Rate (CER) of 3.138% for the font Times New Roman which gives better recognition than the other font style metric performance. The model achieved an OCR result of (7.435% CER) DejaVuSans font style image dataset, while for Ariel font image dataset, a result of 15.141% was achieved. The introduction of Language model-based Standard Yorùbá a spell-checker corrector show a reduction in the Character Error Rate. The Times New Roman font recorded an error rate of 1.182%, the DejaVuSans font style at an error rate of 4.098% while the Ariel font at 5.87%. The study concluded that the performance of the model shows that the farther away an image text font is from the font(s) used in training the network, the higher the character error rate of the recognition and that the inclusion of a post-processing stage shows a reduction in the Character Error Rates.en_US
dc.description.sponsorshipACE: ICT-Driven Knowledge Parken_US
dc.identifier.citationONI, O. J., & ASAHIAH, F. O. (2020). Computational modelling of an optical character recognition system for Yorùbá printed text images. Scientific African, 9, e00415.en_US
dc.identifier.issn2468-2276
dc.identifier.uri10.1016/j.sciaf.2020.e00415
dc.identifier.urihttp://hdl.handle.net/123456789/1973
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.subjectSTEMen_US
dc.subjectObafemi Awolowo Universityen_US
dc.subjectOptical characteren_US
dc.subjectYorùbáen_US
dc.subjectOrthographyen_US
dc.subjectComputational modellingen_US
dc.subjectSpell-Checken_US
dc.subjectcorrectionen_US
dc.subjectOCRopusen_US
dc.titleComputational Modelling of an Optical Character Recognition System for Yorùbá Printed Text Imagesen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Computational_modelling_of_an_optical_character_recognition_system.pdf
Size:
6.36 MB
Format:
Adobe Portable Document Format
Description:
Main Article
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections