Rosetta Neurons Paper Shows Neuron Populations Diverge Across Scales, Complicating Monosemanticity Assumptions
Summary
Researchers posted a new paper 'Neuron Populations Exhibit Divergent Selectivity with Scale' to r/MachineLearning, extending the Rosetta Neurons line of work on universal neurons shared across different neural network architectures. The paper examines how these cross-model shared representations relate to scaling laws, specialization, and monosemanticity—with findings that complicate the assumption that features become more interpretable at scale. Authors invited community discussion on implications for mechanistic interpretability; the post was trending in r/MachineLearning within 1.7 hours of publication.
Originally reported by reddit.com
Read the original article →Original headline: Rosetta Neurons Paper Shows Neuron Populations Diverge Across Scales, Complicating Monosemanticity Assumptions