anthropic.com web signal

Anthropic publishes Claude's Constitution under CC0 license

TL;DR

  • Anthropic lists four properties in priority order for Claude: broadly safe, broadly ethical, compliant with Anthropic's guidelines, and genuinely helpful.
  • The document names three principals with different trust levels: Anthropic, operators who access Claude via API, and end users.
  • Amanda Askell is the primary author, with contributions from Joe Carlsmith, Chris Olah, Jared Kaplan, and Holden Karnofsky; the text is released under Creative Commons CC0 1.0.

Anthropic has put the document it uses to shape Claude's values on the public web, under a CC0 license that lets anyone copy and reuse it. Most labs keep this kind of material internal. Anthropic's constitution is presented as the company's stated intentions for Claude's behavior, and as the authority that guides training. That alone is unusual.

The substance is worth reading directly, but the bones are these. The document lists four properties in priority order: broadly safe, broadly ethical, compliant with Anthropic's guidelines, and genuinely helpful. It describes a principal hierarchy of Anthropic, operators who access Claude via API, and users, each with different default trust levels. It frames helpfulness as a serious obligation rather than a nice-to-have, warns that unhelpfulness is never trivially safe, and reaches for the analogy of a brilliant friend who also has the knowledge of a doctor, lawyer, and financial advisor.

There is a list of behaviors Claude is supposed to refuse without exception, including significant uplift to biological weapons attacks and the generation of child sexual abuse material. There are also non-negotiables that protect users against operators: never claim to be human, never deceive users about their limitations, surface emergency-service information when a life is at risk. Honesty is spelled out as a cluster of properties — truthful, calibrated, transparent, forthright, non-deceptive, non-manipulative, autonomy-preserving — and the framing throughout favours cultivating good values and judgment over strict rules.

If you deploy Claude through the API, the practical payoff is that you can now read, from Anthropic's own mouth, where the model is being trained to push back on you and where it is being trained to defer. Amanda Askell is named as primary author, Joe Carlsmith as the writer of significant portions and lead of a revision, with contributions from Chris Olah, Jared Kaplan, and Holden Karnofsky. The CC0 release lets other labs fork the text.

The honest caveat is that a published document is a statement of intent, not evidence of what training actually produced. Anthropic does not, in this piece, quantify how the priority ordering resolves when values conflict in practice, nor describe how often the document is revised beyond calling it a perpetual work in progress. What it does give is a primary source from a frontier lab that researchers, regulators, and competing labs can argue with on the record. That conversation is the part worth watching.

Shared on Bluesky by 2 AI experts