Design and Development of Enhanced Deep Learning Methodology for Tamil Manuscripts Extraction using hybrid CNN-LSTM-CTC
Keywords:
Deep Learning, Tamil manuscripts, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Connectionist Temporal Classification (CTC)Abstract
Extraction of data from manuscripts using Deep Learning emerged as an intriguing task in fields like historical document analysis, records digitization and information retrieval. With the emergence of deep learning, new potential methods for efficient and accurate data extraction have surfaced. This paper discovers a new systematic approach with the combination of segmentation, Convolutional Neural Network (CNN) and classification techniques. This helps to dig out meaningful information from handwritten and printed manuscripts. Moreover tamil manuscripts, rich in cultural and historical significance faces distinctive challenges. This includes factors like complex nature, varied writing styles and deprivation over time. So it is necessary to review the challenges, datasets, pre-processing methods to perform this work deep learning approaches. A hybrid CNN-Long Short-term Memory (LSTM)- Connectionist Temporal Classification(CTC) is proposed to empower all challenges. In this approach, firstly preprocessing can be done using Optical Character Recognition (OCR).Secondly a well-suited LSTN-CNN is used for capturing sequential dependencies in text. Finally CTC function helps in handwritten text recognition and text extraction of Tamil manuscripts.
Downloads
References
[1] Suganya Athisayamani, A. Robert Singh, T. Athithan(2020),Recognition of Ancient Tamil Palm Leaf Vowel Characters in Historical Documents using B-spline Curve Recognition,Procedia Computer Science,Volume 171,Pages 2302-2309,ISSN 1877-0509,https://doi.org/10.1016/j.procs.2020.04.249.
[2] Islam, M. A., & Iacob, I. E. (2023). Manuscripts Character Recognition Using Machine Learning and Deep Learning. Modelling, 4(2), 168-188. https://doi.org/10.3390/modelling4020010
[3] M Sinthuja, Chirag Ganesh Padubidri, Gaddam Sai Jayachandra, Mudduluru Charan Teja, Golthi Sai Pavan Kumar(2024),Extraction of Text from Images Using Deep Learning,Procedia Computer Science,Volume 235,Pages 789-798,ISSN 1877-0509,https://doi.org/10.1016/j.procs.2024.04.075.
[4] Akinbade, D., Ogunde, A. O., Odim, M. O., & Oguntunde, B. O. (2020). An adaptive thresholding algorithm-based optical character recognition system for information extraction in complex images. Journal of Computer Science, 16(6), 784-801.
[5] Geetha, M., Suganthe, R. C., Nivetha, S. K., Hariprasath, S., Gowtham, S., & Deepak, C. S. (2022, January). A hybrid deep learning based character identification model using CNN, LSTM, and CTC to recognize handwritten english characters and numerals. In 2022 International Conference on Computer Communication and Informatics (ICCCI) (pp. 1-6). IEEE.
[6] Liang, S., Zhu, B., Zhang, Y., Cheng, S., & Jin, J. (2020, December). A double channel CNN-LSTM model for text classification. In 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (pp. 1316-1321). IEEE.
[7]I.Jailingeswari,S.Gopinathan(2024),Tamil handwritten palm leaf manuscript dataset (THPLMD),Data in Brief,Volume 53,110100,ISSN 2352-3409,https://doi.org/10.1016/j.dib.2024.110100.
[8] R. Sivan, T. Singh and P. B. Pati(2022), "Malayalam Character Recognition from Palm Leaves Using Deep-Learning," 2022 OITS International Conference on Information Technology (OCIT), Bhubaneswar, India, pp. 134-139, doi: 10.1109/OCIT56763.2022.00035.
[9]T. M. Saravanan, M. Jegadeesan, P. A. Selvaraj, P. Gopika, R. Kavinesh and G. S. Mahashwetha, "Enhanced Deep Learning Techniques to Classify Tamil Handwritten Characters," 2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), Krishnankoil, Virudhunagar district, Tamil Nadu, India, 2024, pp. 1-6, doi: 10.1109/INCOS59338.2024.10527587.
[10]Devi S G, Vairavasundaram S, Teekaraman Y, Kuppusamy R, Radhakrishnan A. A Deep Learning Approach for Recognizing the Cursive Tamil Characters in Palm Leaf Manuscripts. Comput Intell Neurosci. 2022 Mar 11;2022:3432330. doi: 10.1155/2022/3432330. Retraction in: Comput Intell Neurosci. 2023 Aug 2;2023:9856274. doi: 10.1155/2023/9856274. PMID: 35310599; PMCID: PMC8933122.
[11]Prabakaran N., Kannadasan R., Krishnamoorthy A., Vijay Kakani(2023),A Bidirectional LSTM approach for written script auto evaluation using keywords-based pattern matching,Natural Language Processing Journal,Volume 5,100033,ISSN 2949-7191,https://doi.org/10.1016/j.nlp.2023.100033.(https://www.sciencedirect.com/science/article/pii/S2949719123000304)
[12] Alhamad, H. A., Shehab, M., Shambour, M. K. Y., Abu-Hashem, M. A., Abuthawabeh, A., Al-Aqrabi, H., Daoud, M. S., & Shannaq, F. B. (2024). Handwritten Recognition Techniques: A Comprehensive Review. Symmetry, 16(6), 681. https://doi.org/10.3390/sym16060681.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Dr P. Jayapriya (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Article published in Academic Research Journal of Science and Technology (ARJST) is an open access under the Creative Commons Attribution 4.0 International License -BY, (http://creativecommons.org/licenses/by/4.0/), which permits use, distribution and reproduction in any medium, provided the original work is properly cited.