sakana.ai web signal

Sakana Fugu Packages Multi-Agent AI as a Single API Model

TL;DR

  • Fugu routes tasks through a dynamic multi-agent pipeline exposed as a single OpenAI-compatible API, removing orchestration setup from users.
  • The system draws on two ICLR 2026 papers: TRINITY assigns Thinker/Worker/Verifier roles; Conductor uses reinforcement learning to design coordination strategies.
  • Fugu Ultra scored 73.7 on SWE Bench Pro and 93.2 on LiveCodeBench; base Fugu reached 95.5 on GPQA-D, per Sakana's own benchmark reporting.

Most multi-agent frameworks put the orchestration work on you: define the graph, assign the roles, wire the prompts. Sakana AI's Fugu takes a different position, treating the entire coordinated system as a single model accessible through one OpenAI-compatible API endpoint.

The technical foundation draws on two papers accepted to ICLR 2026. The first, TRINITY, uses a lightweight evolved coordinator to orchestrate multiple LLMs over several turns, assigning Thinker, Worker, or Verifier roles. The second, Conductor, is trained via reinforcement learning to discover natural-language coordination strategies, designing agent communication patterns and focused prompts at runtime. Neither approach requires users to prescribe team organization or workflows; the system assembles what each task needs.

The benchmark numbers Sakana reports are notable. Fugu Ultra scored 73.7 on SWE Bench Pro and 93.2 on LiveCodeBench; the base Fugu variant reached 95.5 on GPQA-D. Sakana describes these scores as surpassing publicly accessible frontier models. Take that claim as reported rather than independently verified; benchmark comparisons published by the company releasing the product warrant scrutiny, particularly when the exact comparison set is not spelled out.

Pricing is structured to sidestep a friction point common in multi-model setups. When multiple agents activate, users pay only the highest-tier model rate rather than stacking costs across every call. Subscriptions run from $20 to $200 a month. The system is available globally except the EU/EEA, where GDPR compliance work is still pending.

What Sakana does not detail is which specific frontier models sit in the pool, what Conductor's reinforcement learning signal looks like, or how latency holds up across multi-turn coordination tasks. For teams already using explicit orchestration frameworks, whether a black-box coordinator outweighs the loss of workflow transparency is an open question that independent evaluation, not a benchmark table, will have to answer.

Shared on Bluesky by 2 AI experts