Why attention heads attend where they do

Much of my research is about understanding why an attention head attends where it does. When a head attends from one token to another, I want to identify the specific information in the residual stream that caused that choice.

This line of work began with Sparse Attention Decomposition, where we showed that the signals attention heads use to communicate are often sparsely encoded in the singular vectors of their query-key matrices (Franco & Crovella, 2024). We developed that idea into attention-causal communication (ACC): a way to isolate those signals as low-dimensional features with a provable causal link to attention, and to trace circuits from a single forward pass. That paper was published at NeurIPS 2025 (Franco & Crovella, 2025).

More recently, ACC++ refines the method to extract cleaner, lower-dimensional signals, many of which admit a short natural-language description, and uses them to find interpretable prompt-specific circuits (Franco et al., 2026).

The method is available as a pip-installable library, accpp-tracer. Code for the individual papers lives in sparse-attention-decomposition, pinpointing-attention-causal-communication, and finding-highly-interpretable-circuits.

References

2026

arXiv
Finding Interpretable Prompt-Specific Circuits in Language Models

Gabriel Franco, Lucas M. Tassis, Azalea Rohr, and 1 more author

arXiv preprint arXiv:2602.13483, 2026

Abs arXiv Bib Code

Understanding the internal circuits that language models use to solve tasks remains a central challenge in mechanistic interpretability. A crucial part of finding circuits is understanding why each attention head attends where it does. To this end, we introduce ACC++, an improved circuit-tracing method based on the principle of attention-causal communication (ACC), which identifies signals, i.e., contents of low dimensional subspaces that cause attention on a token pair. ACC++ extracts circuits from a single forward pass, without replacement models or patching. Circuits identified by ACC++ consist of components that are causal for the model’s attention decisions, together with the low-dimensional signals used to communicate between them.
@article{franco2026finding, title = {Finding Interpretable Prompt-Specific Circuits in Language Models}, author = {Franco, Gabriel and Tassis, Lucas M. and Rohr, Azalea and Crovella, Mark}, journal = {arXiv preprint arXiv:2602.13483}, year = {2026}, }

2025

NeurIPS
Pinpointing Attention-Causal Communication in Language Models

Gabriel Franco and Mark Crovella

In Advances in Neural Information Processing Systems 38 (NeurIPS), 2025

Abs Bib HTML Code

The attention mechanism plays a central role in the computations performed by transformer-based models, and understanding the reasons why heads attend to specific tokens can aid in interpretability of language models. Although considerable work has shown that models construct low-dimensional feature representations, little work has explicitly tied low-dimensional features to the attention mechanism itself. In this paper we work to bridge this gap by presenting methods for identifying attention-causal communication, meaning low-dimensional features that are written into and read from tokens, and that have a provable causal relationship to attention patterns. We show that by identifying those signals, we can perform prompt-specific circuit discovery in a single forward pass. Further, we show that signals can uncover unexplored mechanisms at work in the model, including a surprising degree of global coordination across attention heads.
@inproceedings{franco2025pinpointing, title = {Pinpointing Attention-Causal Communication in Language Models}, author = {Franco, Gabriel and Crovella, Mark}, booktitle = {Advances in Neural Information Processing Systems 38 (NeurIPS)}, year = {2025}, }

2024

arXiv
Sparse Attention Decomposition Applied to Circuit Tracing

Gabriel Franco and Mark Crovella

arXiv preprint arXiv:2410.00340, 2024

Abs arXiv Bib Code

Many papers have shown that attention heads work in conjunction with each other to perform complex tasks. It’s frequently assumed that communication between attention heads is via the addition of specific features to token residuals. In this work we seek to isolate and identify the features used to effect communication and coordination among attention heads in GPT-2 small. Our key leverage on the problem is to show that these features are very often sparsely coded in the singular vectors of attention head matrices. We characterize the dimensionality and occurrence of these signals across the attention heads in GPT-2 small when used for the Indirect Object Identification (IOI) task.
@article{franco2024sparse, title = {Sparse Attention Decomposition Applied to Circuit Tracing}, author = {Franco, Gabriel and Crovella, Mark}, journal = {arXiv preprint arXiv:2410.00340}, year = {2024}, }