Skip to content

Rate Distortion and Spectral Analysis on Representations

This note gathers papers that use concepts from information theory and spectral theory for deep learning.

Hierarchical Tokenization for images (also relates to Global Precedence Effect)

Other non-linear tokenizations

Coding Rate

  • [[White-Box Transformers via Sparse Rate Reduction - Compression Is All There Is|White-Box Transformers via Sparse Rate Reduction - Compression Is All There Is]] — Frames representation learning as sparse rate reduction toward mixtures of low-dimensional Gaussians, yielding transparent, theoretically grounded transformer layers.
  • Simplifying DINO via Coding Rate Regularization — Shows that adding a coding-rate loss term stabilizes and simplifies DINO, removing most heuristics while improving robustness and accuracy.
    • Both use coding rate, which is differential #entropy under a Gaussian source and serves as an upper bound on true differential entropy for real-valued vectors.

Other

PS: Personally curated list. (1-sentence summaries by o3 :p).