techcrunch.com web signal

Wirestock Raises $23M for Licensed AI Training Data

funding synthetic data copyright ai-business training-data

Key insights

  • Wirestock combines 50M+ licensed images and videos from 700,000 creators, already generating $40M+ in annualized revenue before this raise.
  • The Bartz v. Anthropic settlement accelerated enterprise demand for rights-cleared multimodal data by raising the legal cost of unlicensed scraping.
  • New $23M Series A funding will expand Wirestock's catalog into audio and music, the highest-complexity licensing categories in training data.

Why this matters

AI labs building frontier models now face concrete legal precedent that unlicensed training data creates settlement-level liability, meaning licensed data procurement is shifting from optional to required infrastructure. Wirestock's $40M ARR before a Series A suggests the market is already paying a premium for compliance, and the expansion into audio sets up a near-monopoly position in a modality where music licensing has historically been the most litigated IP category. For founders and technical leaders, this signals that training data supply chains will increasingly look like enterprise software contracts, with audit trails, creator consent records, and modality-specific licensing terms baked in.

Summary

Wirestock has closed a $23M Series A led by Nava Ventures, betting that the AI industry's hunger for rights-cleared training data is a durable business, not a passing wave. The company pivoted earlier this year from stock photography to AI training datasets and now sits on a library of 50M+ images and videos contributed by 700,000 creators. The timing is deliberate. The Bartz v. Anthropic settlement earlier this year put the entire industry on notice that scraping unlicensed content carries real legal exposure, and Wirestock is positioning its creator-licensed catalog as the clean alternative. The company already reports $40M+ in annualized revenue, which suggests AI labs are paying, not just browsing. Essentially: (Wirestock, Nava Ventures) are building the licensed-data layer that sits between creators and AI labs that can no longer afford copyright risk. - 700,000 contributing creators represent a distributed supply chain that is difficult for competitors to replicate quickly. - New funding will extend the catalog into audio and music modalities, where licensing complexity is even higher than for images. - $40M+ ARR before a Series A close signals strong product-market fit and likely outpaced internal AI lab data-procurement efforts. The broader shift is that training data is becoming a regulated input, not a free resource, and the companies that built compliant pipelines early will extract the most margin from that constraint.

Potential risks and opportunities

Risks

  • If courts interpret the Bartz v. Anthropic settlement narrowly, AI labs may revert to unlicensed scraping with minimal exposure, collapsing Wirestock's compliance-premium pricing.
  • Wirestock's creator supply chain is a single point of failure: a class-action by contributing photographers challenging revenue-share terms could freeze the catalog and breach enterprise data contracts mid-term.
  • Expanding into music and audio exposes Wirestock to the major labels (Universal, Sony, Warner), which have aggressively litigated AI training use cases and could block or heavily tax that modality before the product launches.

Opportunities

  • Competing licensed-data platforms (Shutterstock, Getty Images) could accelerate their own AI training data products now that Wirestock has validated $40M ARR in this segment.
  • Law firms and compliance vendors specializing in AI IP (Crowell and Moring, Wilson Sonsini) gain leverage selling training-data audit services to labs that lack Wirestock-style provenance documentation.
  • Audio-native AI companies (ElevenLabs, Suno, Udio) face the same licensed-data gap in music that image-model builders faced two years ago, making Wirestock's audio expansion a potential acquisition target or exclusive supply deal in the next 12-18 months.

What we don't know yet

  • Creator revenue-share terms and payout rates are undisclosed, leaving open whether Wirestock's model is sustainable for the 700,000 contributors at scale.
  • Whether existing AI lab customers (unnamed in reporting) have signed multi-year contracts or are transacting on a per-dataset basis, which would materially affect revenue durability.
  • How Wirestock's licensed catalog competes with AI labs building direct creator partnerships (e.g., Adobe's Firefly data program or Getty's deals with generative AI vendors) as of mid-2026.