The statement 'we do/do not understand how LLMs work' almost invariably confuses two very different things. On the one hand, we absolutely can map and describe, in great detail, every single mathematical operation that they perform to generate an is answer. But...
Who's Who of AI
Rich Harang
Using bad guys to catch math since 2010.
Distinguished Security Architect (AI/ML) and AI Red Team at NVIDIA.
He/him. Personal account etc; `from std_disclaimers import *`
AI Security since it was ML Security.
Their own posts