Adapting Vision Foundation Models for Plant Phenotyping

Properties
authors	Feng Chen, Mario Valerio Giuffrida, Sotirios A. Tsaftaris
year	2023
url	https://openaccess.thecvf.com/content/ICCV2023W/CVPPA/html/Chen_Adapting_Vision_Foundation_Models_for_Plant_Phenotyping_ICCVW_2023_paper.html

Abstract

Foundation models are large models pre-trained on tremendous amount of data. They can be typically adapted to diverse downstream tasks with minimal effort. However, as foundation models are usually pre-trained on images or texts sourced from the Internet, their performance in specialized domains, such as plant phenotyping, comes into question. In addition, fully fine-tuning foundation models is time-consuming and requires high computational power. This paper investigates the efficient adaptation of foundation models for plant phenotyping settings and tasks. We perform extensive experiments on fine-tuning three foundation models, MAE, DINO, and DINOv2 on three essential plant phenotyping tasks: leaf counting, instance segmentation, and disease classification. In particular, the pre-trained backbones are kept frozen, while two distinct fine-tuning methods are evaluated, namely adapter tuning (using LoRA) and decoder tuning. The experimental results show that a foundation model can be efficiently adapted to multiple plant phenotyping tasks, yielding similar performance as the state-of-the-art (SoTA) models specifically designed or trained for each task. Despite exhibiting great transferability over different tasks, the fine-tuned foundation models perform slightly worse than the SoTA task-specific models in some scenarios, which requires further investigation.

Notes¶

Motivation / Problem

Foundation models struggle with specialized data like (plant phenotyping, cancer predictions)

Research question

Which efficient fine-tuning technique is most promising for adapting foundation models (MAE, DINO, DINOv2) in specialized data?

Methods

Benchmarked fine-tuning methods include decoder fine-tuning (aka linear probing) and adapter tuning (linear probing + LoRa)

Results

LoRa consistently beats DT
VFM w/ LoRa are often competitive fully-trained/finetuned SOTA
It's not clear that one vfm beats another, each model (DINO, DINOv2, MAE) have metrics and tasks where they shine
LoRa can help dampen issues of data scarcity, domain shifts and class imbalance