well, lo, people have built a benchmark using a task called multi-armed bandit, where success depends on iteratively grasping the relative odds of payouts from different choices, and LLMs are (generally) shit at it arxiv.org/pdf/2403.15371 /
tweety fish
Articles & links
this, posted elsewhere in these sprawling threads, is very interesting on the topic of what emotional representations exist in frontier models and what they might mean: www.thetransmitter.org/emotion/what...
as promised here is the paper comparing LLMs to human results; I haven't gone through it carefully enough to vouch for the methodology and for various reasons I have my suspicions about how robust it'll be but it's certainly conceptually interesting: arxiv.org/pdf/2505.09901
quoted sentence paraphrased from today's Matt Levine which links www.bloomberg.com/news/article... and robinhood.com/us/en/newsro...
Recent commentary
Last night it occured to me to wond er if LLMs were any good at gambling tasks. This is important not because it'd be funny for LLMs to gamble but because gambling tasks get used to measure human decision-making under risk /
listen, I try not to prejudge this stuff too hard because I think it's important for me, at least, to have my skepticism rooted in up-to-date knowledge but "Robinhood offers in-app agentic trading to its users" is one hell of an alarming sentence