aclanthology.org web signal

MilaNLP: empathy as side task hurts emotion classification

TL;DR

  • MilaNLP at Bocconi submitted to the WASSA 2021 shared task on essay-level emotion classification of reactions to English news stories.
  • Adding empathy as an auxiliary task and demographic attributes as input both gave worse performance than the plain single-task model.
  • The authors conclude emotion and empathy are not related tasks for prediction, despite the submission remaining competitive at the competition.

A short paper from the WASSA 2021 shared task on emotion classification turned in a quietly useful negative result, the kind that should change a default rather than a leaderboard. The MilaNLP team at Bocconi University in Milan, made up of Tommaso Fornaciari, Federico Bianchi, Debora Nozza and Dirk Hovy, described their submission to Track 2 of the workshop, predicting the emotion of reactions to English news stories at the essay level.

The setup was the kind a practitioner would reach for instinctively. They tested multi-task and multi-input frameworks, with empathy added as an auxiliary task and demographic attributes fed in as additional input, on the theory that all of that correlated information should help. It did not. The team reports that both moves produced worse performance than the equivalent single-task model. Their entry was competitive at the competition, but the takeaway in the abstract is blunt: emotion and empathy are not related tasks, at least for the purpose of prediction.

The honest caveat is that this is a single result on one shared-task dataset, and the abstract does not give you the architecture details, the size of the score gap, or where the team ranked on the final leaderboard. What it does give you is a reason to be skeptical of the default move of bolting plausible-sounding auxiliary signals onto a text classifier. For teams building affective computing systems, the practical implication is to A/B the more ambitious setup against a clean single-task baseline before paying for it, because plausibility is not the same thing as transfer.

Shared on Bluesky by 2 AI experts