Mean Attention Distance

Introduced in An image is worth 16x16 words - Transformers for image recognition at scale.

From What Do Self-Supervised Vision Transformers Learn?

“Attention distance is defined as the average distance between the query tokens and key tokens considering their self-attention weights. Therefore, it conceptually corresponds to the size of the receptive fields in CNNs.” (Park et al., 2023, p. 3)

Key Observation

Can be used to measure what is the how much local or global information is a transformer using. See What Do Self-Supervised Vision Transformers Learn?.