docs.x.ai web signal

xAI Cuts Grok API to Single Model at $1.25/M

xai generative ai ai-business

Key insights

  • xAI retired eight Grok model variants on May 15, redirecting all API traffic automatically to the single remaining model, grok-4.3.
  • grok-4.3 is priced at $1.25 per million input tokens and $2.50 per million output tokens, with no tiered alternatives remaining.
  • xAI explicitly warns that auto-redirects may break streaming and function-calling workloads, meaning silent failures are possible for some API consumers.

Why this matters

Developers and enterprises running production workloads on any of the eight retired endpoints face potential silent regressions in streaming or function-calling pipelines, since automatic redirects do not guarantee behavioral parity. The consolidation to a single model eliminates the cost and latency tradeoffs that differentiated models like grok-4-1-fast-reasoning provided, forcing teams to re-evaluate their inference architecture under a single pricing tier. More broadly, this mirrors a pattern where frontier AI labs shrink their product surface to reduce operational complexity as they scale developer tooling, which compresses the options available to builders who relied on specialized model variants for specific task profiles.

Summary

xAI consolidated its entire API lineup to one model today, retiring eight Grok variants simultaneously and routing all deprecated endpoint traffic to grok-4.3 at $1.25 per million input tokens and $2.50 per million output tokens. The retired models include grok-4-1-fast-reasoning, grok-4-0709, grok-3, and grok-code-fast-1. While xAI has configured automatic redirects for API calls hitting the old endpoints, the company explicitly warns the redirect is not a guaranteed drop-in replacement, flagging streaming and function-calling workloads as particularly likely to behave differently under the new model. Essentially: (xAI) is compressing a fragmented model catalog into a single flagship, following a pattern OpenAI established with its DALL-E retirement. - Eight models retired simultaneously at 12:00 PM PT on May 15, 2026, with no grace period for migration. - Automatic redirects are live but carry a documented caveat for function-calling and streaming configurations. - The move aligns with xAI scaling its Grok Build CLI agent to a broader developer base, suggesting the company wants a cleaner surface for tooling to target. The consolidation signals xAI is prioritizing developer platform simplicity over model-tier optionality, a bet that works only if grok-4.3 holds up across the workload diversity its predecessors served.

Potential risks and opportunities

Risks

  • Enterprises running function-calling pipelines against retired endpoints could see silent failures or malformed outputs post-redirect, with no fallback model available to isolate the regression.
  • Teams that priced inference budgets around cheaper fast-reasoning variants face immediate cost increases with no migration path to a comparable lower-cost option within the xAI ecosystem.
  • If grok-4.3 experiences an outage or quality regression, xAI's single-model API surface means all API consumers are affected simultaneously, with no secondary model to failover to within the platform.

Opportunities

  • Multi-model inference routers and abstraction layers (OpenRouter, LiteLLM, Martian) can position grok-4.3 as one node in a diversified routing strategy for teams burned by the single-model concentration risk.
  • Competing API providers (Anthropic, Mistral, Google) offering tiered model pricing have a direct pitch to xAI customers who depended on fast or low-cost variant tiers that no longer exist.
  • Observability and testing vendors (Braintrust, Langfuse, Arize) can target xAI API consumers needing regression detection tooling to catch behavioral drift introduced by the automatic endpoint redirects.

What we don't know yet

  • Whether xAI will publish a behavioral comparison between grok-4.3 and the retired variants to help developers audit regressions, particularly for function-calling schemas that may silently degrade.
  • What the pricing impact is for teams that previously used lower-cost fast-reasoning variants to handle high-volume inference, given no cheaper tier now exists.
  • Whether the Grok Build CLI agent will expose model configuration options or lock developers to grok-4.3 exclusively as the platform scales.