Why attention heads attend where they do

Isolating the low-dimensional signals that cause attention, and using them to trace interpretable circuits from a single forward pass.

Much of my research is about understanding why an attention head attends where it does. When a head attends from one token to another, I want to identify the specific information in the residual stream that caused that choice.

This line of work began with Sparse Attention Decomposition, where we showed that the signals attention heads use to communicate are often sparsely encoded in the singular vectors of their query-key matrices (Franco & Crovella, 2024). We developed that idea into attention-causal communication (ACC): a way to isolate those signals as low-dimensional features with a provable causal link to attention, and to trace circuits from a single forward pass. That paper was published at NeurIPS 2025 (Franco & Crovella, 2025).

More recently, ACC++ refines the method to extract cleaner, lower-dimensional signals, many of which admit a short natural-language description, and uses them to find interpretable prompt-specific circuits (Franco et al., 2026).

The method is available as a pip-installable library, accpp-tracer. Code for the individual papers lives in sparse-attention-decomposition, pinpointing-attention-causal-communication, and finding-highly-interpretable-circuits.

References

2026

  1. arXiv
    Finding Interpretable Prompt-Specific Circuits in Language Models
    Gabriel Franco, Lucas M. Tassis, Azalea Rohr, and 1 more author
    arXiv preprint arXiv:2602.13483, 2026

2025

  1. NeurIPS
    Pinpointing Attention-Causal Communication in Language Models
    Gabriel Franco and Mark Crovella
    In Advances in Neural Information Processing Systems 38 (NeurIPS), 2025

2024

  1. arXiv
    Sparse Attention Decomposition Applied to Circuit Tracing
    Gabriel Franco and Mark Crovella
    arXiv preprint arXiv:2410.00340, 2024