projects | Gabriel Franco

Why attention heads attend where they do

Isolating the low-dimensional signals that cause attention, and using them to trace interpretable circuits from a single forward pass.

When and why the singular vectors of attention matrices align with the features a model uses.

Model selection and benchmarking for a weakly supervised setting where only bag-level label proportions are known.