reddit.com via Reddit

Rosetta Neurons Paper Shows Neuron Populations Diverge Across Scales, Complicating Monosemanticity Assumptions

safety mechanistic-interpretability scaling-laws monosemanticity

Summary

Researchers posted a new paper 'Neuron Populations Exhibit Divergent Selectivity with Scale' to r/MachineLearning, extending the Rosetta Neurons line of work on universal neurons shared across different neural network architectures. The paper examines how these cross-model shared representations relate to scaling laws, specialization, and monosemanticity—with findings that complicate the assumption that features become more interpretable at scale. Authors invited community discussion on implications for mechanistic interpretability; the post was trending in r/MachineLearning within 1.7 hours of publication.