"Singular Vectors of Attention Heads Align with Features" was accepted at ICML 2026.
In this paper, we study why the singular vectors of attention matrices so often align with the features a model uses. We give both empirical and theoretical evidence for when this alignment happens, and we show how to recognize it in real models.
Here is a short thread about the work:
View the thread on X