Adapting Vision Foundation Models for Plant Phenotyping
Properties | |
---|---|
authors | Feng Chen, Mario Valerio Giuffrida, Sotirios A. Tsaftaris |
year | 2023 |
url | https://openaccess.thecvf.com/content/ICCV2023W/CVPPA/html/Chen_Adapting_Vision_Foundation_Models_for_Plant_Phenotyping_ICCVW_2023_paper.html |
Abstract
Foundation models are large models pre-trained on tremendous amount of data. They can be typically adapted to diverse downstream tasks with minimal effort. However, as foundation models are usually pre-trained on images or texts sourced from the Internet, their performance in specialized domains, such as plant phenotyping, comes into question. In addition, fully fine-tuning foundation models is time-consuming and requires high computational power. This paper investigates the efficient adaptation of foundation models for plant phenotyping settings and tasks. We perform extensive experiments on fine-tuning three foundation models, MAE, DINO, and DINOv2 on three essential plant phenotyping tasks: leaf counting, instance segmentation, and disease classification. In particular, the pre-trained backbones are kept frozen, while two distinct fine-tuning methods are evaluated, namely adapter tuning (using LoRA) and decoder tuning. The experimental results show that a foundation model can be efficiently adapted to multiple plant phenotyping tasks, yielding similar performance as the state-of-the-art (SoTA) models specifically designed or trained for each task. Despite exhibiting great transferability over different tasks, the fine-tuned foundation models perform slightly worse than the SoTA task-specific models in some scenarios, which requires further investigation.
Notes¶
Motivation / Problem
Foundation models struggle with specialized data like (plant phenotyping, cancer predictions)
Research question
Which efficient fine-tuning technique is most promising for adapting foundation models (MAE, DINO, DINOv2) in specialized data?
Methods
Benchmarked fine-tuning methods include decoder fine-tuning (aka linear probing) and adapter tuning (linear probing + LoRa)
Results
- LoRa consistently beats DT
- VFM w/ LoRa are often competitive fully-trained/finetuned SOTA
- It's not clear that one vfm beats another, each model (DINO, DINOv2, MAE) have metrics and tasks where they shine
- LoRa can help dampen issues of data scarcity, domain shifts and class imbalance