fortune.com web signal

Sail Research Raises $80M to Cut AI Inference Costs for Long-Running Agents

funding ai infrastructure inference ai-infrastructure inference funding

TL;DR

  • Sail Research raised $80M in seed and Series A funding at a $450M valuation, led by Kleiner Perkins with Sequoia, Redpoint, and others.
  • Agentic workflows consume tokens 50 to 500 times faster than simple chat, driving enterprise AI bills to triple despite falling per-token prices.
  • Sail claims 3x to 10x cost improvements by trading latency for throughput, targeting long-horizon agent workloads on existing GPU hardware.

The gap at the center of Sail Research's pitch is one many AI teams have started noticing: per-token prices have fallen, enterprise AI bills have tripled anyway. The reason, according to Fortune's reporting, is that agentic workflows consume tokens at rates 50 to 500 times higher than simple chat interactions. Where a chatbot answers a question in seconds, agents can run for hours: customer Detail.dev uses Sail for code-review agents that analyze entire codebases over three to four hours.

Sail Research, founded by former Apple engineers Neil Movva and Samin Menon, emerged from stealth with $80 million in seed and Series A funding at a $450 million valuation. Kleiner Perkins led the Series A, with Sequoia, Redpoint, Theory Ventures, Vine Ventures, and CRV also participating. Movva, 28, previously worked at Together AI, which the reporting describes as built for interactive applications, before concluding that long-horizon agents needed infrastructure built with different priorities. Menon, his Stanford classmate and CTO, also came from Apple's security engineering team.

The company launched its inference service in March and says it is processing trillions of tokens per week. Sail's core design choice is to trade latency for throughput. "We only care about efficiency," Movva told Fortune. "It's quite difficult to build an inference engine for both throughput and latency at the same time." The company claims 3x to 10x cost improvements for customers. Kleiner Perkins partner Aditya Naganath put the firm's bet plainly: "The belief that inference is going to be a 10x—even 100x—bigger market than it is today." Goldman Sachs reportedly projects a 24-fold increase in token consumption by 2030, which, if anywhere near accurate, suggests the economic pressure Sail is addressing will intensify considerably.

The honest caveat is that the cost improvement figures come from the company, not from independent benchmarks, and the reporting does not detail Sail's pricing model or what happens when large cloud providers build similar throughput-optimized offerings. What the reporting also does not address is the technical moat: whether the advantage rests in proprietary scheduling algorithms or primarily in the founding team's experience. The forward case is relatively clear regardless: if token consumption grows anywhere near the projected pace and long-running agents remain structurally expensive to run, the infrastructure built specifically for that workload ahead of the market has a real window.