Anthropic updates Responsible Scaling Policy as AI risk debate shifts

AIGovernance

26 Feb

Version 3.0 separates company commitments from industry recommendations and introduces Frontier Safety Roadmaps and formal Risk Reports.

Anthropic has released Version 3.0 of its Responsible Scaling Policy, revising the voluntary framework it uses to manage catastrophic AI risk. The update restructures how the company defines safety commitments, introduces a new Frontier Safety Roadmap, and formalizes the publication of recurring Risk Reports with potential third-party review.

The Responsible Scaling Policy, first introduced in September 2023, was designed around conditional commitments. If models crossed defined capability thresholds, Anthropic would implement stricter safeguards aligned to specific AI Safety Levels. Earlier levels were detailed, while later levels were intentionally left less defined pending future capability advances.

Two and a half years on, Anthropic says some parts of that approach worked as intended, while others proved more complex in practice.

What worked and where ambiguity emerged

According to the company, the policy functioned internally as a forcing mechanism. In order to comply with its ASL-3 deployment standard, Anthropic says it developed more advanced input and output classifiers to reduce risks linked to chemical and biological misuse. ASL-3 safeguards were activated in May 2025 and have since been refined.

Anthropic also credits the framework with encouraging similar safety standards across the sector. Within months of its initial launch, OpenAI and Google DeepMind adopted comparable approaches. Governments in California, New York, and the European Union have since introduced requirements for frontier AI developers to publish risk management frameworks.

However, the company acknowledges that pre-defined capability thresholds have proven harder to interpret than expected. In some cases, models approached or potentially crossed thresholds, but internal evaluation methods did not provide clear-cut answers. Anthropic says it adopted precautionary safeguards in those cases, though it found that uncertainty weakened the broader public case for coordinated industry or government action.

Biological capability testing is cited as one area where ambiguity remains. Anthropic notes that its models pass many rapid tests for biological knowledge, making it difficult to argue risks are low, but that available evaluation methods do not conclusively demonstrate high risk either. Longer experimental validation efforts often lag behind model improvements.

The company also states that government action on AI safety has moved more slowly than anticipated, with policy debates shifting toward competitiveness and economic growth. It says this political context, combined with the difficulty of meeting higher-level safeguards unilaterally, created structural challenges for the earlier version of the policy.

Separating company commitments from industry ambitions

Version 3.0 responds by dividing Anthropic’s unilateral commitments from its broader recommendations for the AI industry. The new structure outlines what the company will pursue independently, alongside a capabilities-to-mitigations map that it believes would be necessary if adopted across the sector.

A new Frontier Safety Roadmap will now be developed and published, setting out concrete goals across security, alignment, safeguards, and policy. These goals are described as public targets rather than binding commitments, with progress expected to be graded openly.

Examples include investigating unconventional approaches to high-level information security, automating red-teaming processes beyond traditional bug bounty participation, implementing systematic measures to align Claude with its constitutional framework, centralizing records of critical AI development activity, and publishing proposals for scalable AI regulation.

Anthropic frames this as an extension of transparency principles it has previously advocated for in frontier AI policy discussions.

Risk Reports and external review

The updated policy also formalizes the production of Risk Reports every three to six months. These reports will assess model capabilities, associated threat models, and active mitigations, and will provide an overall risk evaluation. Public versions will be published online, with limited redactions where required for legal, security, or privacy reasons.

Under certain conditions, Risk Reports will be subject to external review by independent experts with access to minimally redacted versions. Anthropic says reviewers will be selected based on AI safety expertise and the absence of major conflicts of interest.

Although current models do not yet require mandatory external review under the new framework, Anthropic says it is piloting this process in preparation for future capability thresholds.

A recalibrated approach to voluntary AI governance

The Responsible Scaling Policy was always described as a living document. Version 3.0 reflects a shift from pre-set escalation thresholds toward more flexible, transparency-driven governance mechanisms.

For EdTech, enterprise AI adopters, and public sector stakeholders, the implications are practical. As AI capabilities accelerate, voluntary governance frameworks are being revised in real time, often under political and competitive pressure. The question for institutions integrating advanced models into learning, research, and workforce systems is no longer whether safeguards exist, but how clearly those safeguards are defined, tested, and independently reviewed.

Anthropic’s latest update suggests that transparency, rather than rigid threshold triggers, is becoming the preferred lever for maintaining credibility in frontier AI development.

ETIH Innovation Awards 2026

The ETIH Innovation Awards 2026 are now open and recognize education technology organizations delivering measurable impact across K–12, higher education, and lifelong learning. The awards are open to entries from the UK, the Americas, and internationally, with submissions assessed on evidence of outcomes and real-world application.

Featured