PROGRESS AND INNOVATION IN OCR FOR GUJARATI NEWSPAPER TEXT RECOGNITION

Authors

Abstract

Abstract - Character recognition is a crucial technology in natural language processing (NLP) that translates printed text into machine-readable formats, aiding applications in document digitization, information retrieval, and more. This review paper digs into the important role of character recognition systems, emphasizing their usefulness for digitizing Gujarati newspapers. As an extensive repository of cultural and informational content for millions of Gujarati speakers, the digitization of these newspapers with OCR technology is crucial for preservation, accessibility, and distribution. However, the distinctive peculiarities of Gujarati script provide specific issues that demand customized OCR solutions. This research thoroughly reviews existing OCR techniques and methodologies, notably those focused on the Gujarati language, and show the limitations and obstacles faced in this sector. Through a comprehensive literature review, the paper analyzes the evolution and comparison of various character recognition systems, from traditional methods to advanced machine learning and deep learning approaches. It also identifies research gaps and recommends future research topics, seeking to enhance the development and accuracy of Gujarati OCR systems. By addressing these deficiencies, the article hopes to contribute significantly to the improvement of OCR technology for Gujarati, boosting its accessibility and usefulness for digital preservation and educational reasons. The systematic review encompasses the introduction, background, available literature, problems, methodology, system assessments, and future directions, concluding in a summary of significant findings and the vital need for continuous study in this subject..

Downloads

Published

2024-02-20

Issue

Section

Articles