We released a new pre-print "Sparse Attention Decomposition Applied to Circuit Tracing"

The full text is available at arXiv.

For a short explanation of the paper, check it out our Twitter post:

Can we identify the key signals moving between attention heads when a language model performs a task? Our paper (https://t.co/2fUtHa7BTF) offers new tools for this question. A key point of leverage is a new phenomenon we expose: sparse attention decomposition. Exploiting this…
— Gabriel Franco (@gvsfranco) October 23, 2024