arxiv.org web signal

Momennejad and Raileanu Formalize Open-Ended Intelligence

TL;DR

  • Momennejad and Raileanu define open-ended intelligence as compositional closure L(P,C) from minimal primitives and operators, not behavioral diversity.
  • The framework proposes 'next primitive prediction' training, targeting reuse of algorithmic primitives rather than next-token or latent-state prediction.
  • Three evaluation metrics (PRI, CDG, TaR) are proposed to measure compositional generalization, but the paper presents no empirical benchmark results.

Most AI benchmarks measure how well a system performs on problems similar to those it trained on. A paper from Ida Momennejad and Roberta Raileanu, posted to arXiv, takes a different angle: formalizing what it would mean for a system to handle genuinely novel problems, not just harder versions of familiar ones.

Their definition of open-ended intelligence is the capacity to adapt to novel problems and environments substantially different from training contexts. The proposed mechanism is compositional closure, written as L(P,C), induced by a minimal set of primitives P and composition operators C. The primitives split into two types: representational ones that capture an agent's perception of objects, states, and world relationships, and algorithmic ones covering minimal computational operations like comparison, retrieval, or verification. Composition operators then sequence and recombine these through selection, recursion, and branching.

The paper introduces a concrete training objective called next primitive prediction (NPP), which trains a system to predict the next primitive-operator pair given prior context, rather than predicting next tokens or latent states. A Minimum Description Length parsimony constraint pushes the system toward the smallest reusable primitive basis rather than accumulating brittle, task-specific fragments. The authors also propose three evaluation metrics: a Primitive Reuse Index measuring how frequently learned primitives appear across distinct task families, a Compositional Depth Generalization test checking success on solution graphs deeper than training examples, and a Transfer-as-Recomposition measure that holds primitive libraries fixed while changing environmental constants or compositional requirements.

The honest caveat is that this is a theoretical proposal. The paper grounds its framework through case studies in physics, evolution, and neuroscience, including Hox genes as an example of minimal algorithmic primitives and the prefrontal cortex as an implementer of compositional operations. But it does not report empirical results showing NPP systems outperforming alternatives, and whether the proposed metrics gain traction in practice remains open.

For researchers working on mechanistic interpretability, curriculum learning, or multi-agent systems, the framework offers something concrete regardless: a formal design criterion, a training objective with a clear mathematical form, and evaluation metrics that aim to distinguish genuine compositional generalization from memorization. If those metrics earn empirical validation, they could shift how the field measures progress toward adaptive AI, away from benchmark performance on familiar distributions and toward something closer to what open-ended generalization is actually supposed to mean.

Shared on Bluesky by 2 AI experts