Design and Development of Enhanced Deep Learning Methodology for Tamil Manuscripts Extraction using hybrid CNN-LSTM-CTC

Authors

  • Dr P. Jayapriya Nallamuthu Gounder Mahalingam College, Pollachi-642 001, Tamilnadu Author

Keywords:

Deep Learning, Tamil manuscripts, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Connectionist Temporal Classification (CTC)

Abstract

Extraction of data from manuscripts using Deep Learning emerged as an intriguing task in fields like historical document analysis, records digitization and information retrieval. With the emergence of deep learning, new potential methods for efficient and accurate data extraction have surfaced. This paper discovers a new systematic approach with the combination of segmentation, Convolutional Neural Network (CNN) and classification techniques. This helps to dig out meaningful information from handwritten and printed manuscripts. Moreover tamil manuscripts, rich in cultural and historical significance faces distinctive challenges. This includes factors like complex nature, varied writing styles and deprivation over time. So it is necessary to review the challenges, datasets, pre-processing methods to perform this work deep learning approaches. A hybrid CNN-Long Short-term Memory (LSTM)- Connectionist Temporal Classification(CTC) is proposed to empower all challenges. In this approach, firstly preprocessing can be done using Optical Character Recognition (OCR).Secondly a well-suited LSTN-CNN is used for capturing sequential dependencies in text. Finally CTC function helps in handwritten text recognition and text extraction of Tamil manuscripts.

Downloads

Download data is not yet available.

Author Biography

References

[1] Suganya Athisayamani, A. Robert Singh, T. Athithan(2020),Recognition of Ancient Tamil Palm Leaf Vowel Characters in Historical Documents using B-spline Curve Recognition,Procedia Computer Science,Volume 171,Pages 2302-2309,ISSN 1877-0509,https://doi.org/10.1016/j.procs.2020.04.249.

[2] Islam, M. A., & Iacob, I. E. (2023). Manuscripts Character Recognition Using Machine Learning and Deep Learning. Modelling, 4(2), 168-188. https://doi.org/10.3390/modelling4020010

[3] M Sinthuja, Chirag Ganesh Padubidri, Gaddam Sai Jayachandra, Mudduluru Charan Teja, Golthi Sai Pavan Kumar(2024),Extraction of Text from Images Using Deep Learning,Procedia Computer Science,Volume 235,Pages 789-798,ISSN 1877-0509,https://doi.org/10.1016/j.procs.2024.04.075.

[4] Akinbade, D., Ogunde, A. O., Odim, M. O., & Oguntunde, B. O. (2020). An adaptive thresholding algorithm-based optical character recognition system for information extraction in complex images. Journal of Computer Science, 16(6), 784-801.

[5] Geetha, M., Suganthe, R. C., Nivetha, S. K., Hariprasath, S., Gowtham, S., & Deepak, C. S. (2022, January). A hybrid deep learning based character identification model using CNN, LSTM, and CTC to recognize handwritten english characters and numerals. In 2022 International Conference on Computer Communication and Informatics (ICCCI) (pp. 1-6). IEEE.

[6] Liang, S., Zhu, B., Zhang, Y., Cheng, S., & Jin, J. (2020, December). A double channel CNN-LSTM model for text classification. In 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (pp. 1316-1321). IEEE.

[7]I.Jailingeswari,S.Gopinathan(2024),Tamil handwritten palm leaf manuscript dataset (THPLMD),Data in Brief,Volume 53,110100,ISSN 2352-3409,https://doi.org/10.1016/j.dib.2024.110100.

[8] R. Sivan, T. Singh and P. B. Pati(2022), "Malayalam Character Recognition from Palm Leaves Using Deep-Learning," 2022 OITS International Conference on Information Technology (OCIT), Bhubaneswar, India, pp. 134-139, doi: 10.1109/OCIT56763.2022.00035.

[9]T. M. Saravanan, M. Jegadeesan, P. A. Selvaraj, P. Gopika, R. Kavinesh and G. S. Mahashwetha, "Enhanced Deep Learning Techniques to Classify Tamil Handwritten Characters," 2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), Krishnankoil, Virudhunagar district, Tamil Nadu, India, 2024, pp. 1-6, doi: 10.1109/INCOS59338.2024.10527587.

[10]Devi S G, Vairavasundaram S, Teekaraman Y, Kuppusamy R, Radhakrishnan A. A Deep Learning Approach for Recognizing the Cursive Tamil Characters in Palm Leaf Manuscripts. Comput Intell Neurosci. 2022 Mar 11;2022:3432330. doi: 10.1155/2022/3432330. Retraction in: Comput Intell Neurosci. 2023 Aug 2;2023:9856274. doi: 10.1155/2023/9856274. PMID: 35310599; PMCID: PMC8933122.

[11]Prabakaran N., Kannadasan R., Krishnamoorthy A., Vijay Kakani(2023),A Bidirectional LSTM approach for written script auto evaluation using keywords-based pattern matching,Natural Language Processing Journal,Volume 5,100033,ISSN 2949-7191,https://doi.org/10.1016/j.nlp.2023.100033.(https://www.sciencedirect.com/science/article/pii/S2949719123000304)

[12] Alhamad, H. A., Shehab, M., Shambour, M. K. Y., Abu-Hashem, M. A., Abuthawabeh, A., Al-Aqrabi, H., Daoud, M. S., & Shannaq, F. B. (2024). Handwritten Recognition Techniques: A Comprehensive Review. Symmetry, 16(6), 681. https://doi.org/10.3390/sym16060681.

Downloads

Published

2025-05-15

How to Cite

Design and Development of Enhanced Deep Learning Methodology for Tamil Manuscripts Extraction using hybrid CNN-LSTM-CTC. (2025). Academic Research Journal of Science and Technology (ARJST), 1(07), 14-24. https://publications.ngmc.ac.in/journal/index.php/arjst/article/view/53

Similar Articles

1-10 of 19

You may also start an advanced similarity search for this article.