Plant Recognition by AI: Deep Neural Nets, Transformers, and kNN in Deep Embeddings

Abstract

Plant species recognition is a critical area in biodiversity conservation, agricultural management, and ecological studies. Recent advancements in artificial intelligence (AI), particularly in deep learning techniques, have significantly enhanced the accuracy of automated image-based plant identification. This paper reviews existing machine learning methodologies, focusing on Deep Neural Networks (DNNs), Vision Transformers (ViTs), and k-Nearest Neighbors (kNN) in deep embeddings. We propose a novel retrieval-based method utilizing nearest neighbor classification within a deep embedding space, trained via the Recall@k surrogate loss. The effectiveness of this approach is benchmarked against state-of-the-art classifiers on several large datasets. Our results demonstrate the superiority of the retrieval-based method in plant species recognition, significantly outperforming traditional image classification techniques.

1. Introduction

The recognition of plant species from images is increasingly vital for both ecological and agricultural domains. While traditional taxonomic methods are labor-intensive and rely heavily on expert knowledge, AI-driven approaches can automate this process, making plant species identification faster and more accessible. In recent years, Convolutional Neural Networks (CNNs) and Vision Transformers have been at the forefront of image classification tasks. This paper explores performance enhancements derived from these architectures and reports on our proposed retrieval-based method.

2. Background

2.1 Plant Recognition Challenges

The task of plant recognition is inherently complex due to variations in species morphology, environmental conditions, and image quality. Key challenges include:

High intra-class variability and low inter-class variability in plant images.
The presence of occlusions, background noise, and changing perspectives in images.

2.2 State-of-the-Art Methods

Traditional image classification techniques, such as CNNs, have been significantly improved through architectures like ResNeSt, which features residual networks with attention mechanisms. Meanwhile, Vision Transformers have offered a new paradigm by treating image data as sequences, capturing global context more effectively. The ability to fine-tune these models with various augmentation techniques has also contributed to enhanced performance.

3. Methodology

3.1 Proposed Retrieval-Based Method

We propose a novel method leveraging kNN for classification in a deep embedding space, trained using Recall@k as a surrogate loss. This technique allows the identification of plant species based on deep feature representations extracted from the images, significantly improving retrieval accuracy.

3.2 Performance-Enhancement Techniques

Class Prior Adaptation: Modifying the classifier to accommodate class imbalances by using prior distributions to adjust weights.
Image Augmentation: Techniques such as rotation, cropping, and color jittering are applied to mitigate overfitting and improve model generalization.
Learning Rate Scheduling: Employing dynamic learning rates helps adapt training blindness, thus improving convergence.
Loss Functions: Experimenting with different loss functions to optimize classification accuracy further.

3.3 Evaluation Datasets

We employ the following datasets for evaluation:

PlantCLEF 2017: A comprehensive dataset representing diverse plant species.
ExpertLifeCLEF 2018: A real-world dataset designed for biodiversity research.
iNaturalist 2018: A large-scale dataset that includes a vast array of species and user-generated images.

4. Experiments and Results

4.1 Benchmark Evaluation

We benchmark state-of-the-art classifiers, notably the current ViT model (ViT-Large/16) and the best CNN architecture (ResNeSt-269e), on the selected datasets. The ViT-Large/16 model achieved accuracies of 91.15% and 83.54% on the PlantCLEF 2017 and ExpertLifeCLEF 2018 test sets, respectively. The ResNeSt model recorded significant error rate reductions—22.91% on PlantCLEF 2017 and 28.34% on ExpertLifeCLEF 2018.

4.2 Performance Gains via Techniques

The study revealed notable performance enhancements:

The use of augmentation techniques resulted in accuracy improvements of up to 4.67% on PlantCLEF 2017, demonstrating the importance of data diversity in training.

4.3 Retrieval-Based Method Performance

Our proposed retrieval approach outperformed the benchmark classifiers across all scenarios:

ExpertLifeCLEF 2018: +0.28%
PlantCLEF 2017: +4.13%
iNaturalist 2018: +10.25%

5. Discussion

The results underscore the effectiveness of deep embeddings for plant recognition through a kNN retrieval mechanism. The competitive accuracy of state-of-the-art CNNs and ViTs highlights the potential of further research in hybrid models that combine the strengths of both architectures. Moreover, the study demonstrates that fine-tuning strategies significantly enhance model performance, especially in challenging plant recognition tasks.

6. Conclusion

This paper presents a comparative analysis of various machine learning methods for plant species recognition, emphasizing the advantages of our proposed retrieval-based method. The results reveal that deep neural nets and Vision Transformers achieve high accuracy rates, but our kNN approach in a deep embedding space consistently outperforms these traditional methods. The findings highlight opportunities for future work to refine these techniques further and explore their applications in broader ecological contexts.

References

[1] Pretrained Vision Transformers and their Application to Biodiversity
[2] Advances in Plant Recognition Technologies: A Review
[3] The PlantCLEF Challenges: Past, Present, and Future
[4] Transfer Learning with Vision Transformers for Plant Species Classification
[5] Ensemble Methods for Enhanced Predictive Performance in Plant Identification