tweety fish

at different times: haxx0r in cDc PhD in using people for ML and vice versa "theory of mind for autonomous cars" startup guy would you believe, went kablooie at present: newsletter -- buttondown.email/apperceptive &c music, politics, nonsense

Articles & links

well, lo, people have built a benchmark using a task called multi-armed bandit, where success depends on iteratively grasping the relative odds of payouts from different choices, and LLMs are (generally) shit at it arxiv.org/pdf/2403.15371 /

arxiv.org
View on Bluesky · ♥ 47 ↻ 4 ↩ 1 · 2 from the directory shared this · 6d ago

this, posted elsewhere in these sprawling threads, is very interesting on the topic of what emotional representations exist in frontier models and what they might mean: www.thetransmitter.org/emotion/what...

thetransmitter.org
View on Bluesky · ♥ 1 ↻ 1 ↩ 0 · 3 from the directory shared this · 4d ago

as promised here is the paper comparing LLMs to human results; I haven't gone through it carefully enough to vouch for the methodology and for various reasons I have my suspicions about how robust it'll be but it's certainly conceptually interesting: arxiv.org/pdf/2505.09901

arxiv.org
View on Bluesky · ♥ 1 ↻ 0 ↩ 0 · 4d ago

quoted sentence paraphrased from today's Matt Levine which links www.bloomberg.com/news/article... and robinhood.com/us/en/newsro...

bloomberg.com
View on Bluesky · ♥ 5 ↻ 1 ↩ 2 · 2d ago
tweety fish reposted
@peark.es

Whether you’re a denialist or a booster or just someone trying to be objective, Salesforce is going to give us some pretty clear answers as to whether radically shifting resources from human coders to AI will work.

techloy.com View on Bluesky →

Recent commentary

Last night it occured to me to wond er if LLMs were any good at gambling tasks. This is important not because it'd be funny for LLMs to gamble but because gambling tasks get used to measure human decision-making under risk /

View on Bluesky · ♥ 75 ↻ 9 ↩ 4 · 6d ago

listen, I try not to prejudge this stuff too hard because I think it's important for me, at least, to have my skepticism rooted in up-to-date knowledge but "Robinhood offers in-app agentic trading to its users" is one hell of an alarming sentence

View on Bluesky · ♥ 32 ↻ 3 ↩ 5 · 2d ago