Anthropic reveals 31.5% hijack rate for Opus 4.8 browser agent before safeguards

By Editorial Team · Published June 1, 2026 · 2 min read · Source: Crypto Briefing

Anthropic reveals 31.5% hijack rate for Opus 4.8 browser agent before safeguards

The AI lab published the only concrete prompt injection metric among frontier labs this spring, and the number should worry anyone running automated crypto workflows.

Add us on Google by Editorial Team Jun. 1, 2026

Point a red-teamer at Anthropic’s newest model while it’s browsing the web, and the attacker successfully hijacked it nearly one in three times. That’s the raw stat: a 31.5% prompt injection success rate for Claude Opus 4.8’s browser agent before defensive safeguards engage.

The transparency gap between labs

Anthropic dropped a 244-page safety report on May 28, covering four distinct agentic surfaces: browsing the web, writing code, coordinating with other AI agents, and interacting with external tools.

OpenAI reported on just one surface: connectors. Google moved the entire subject out of its model card and into a separate safety framework document. Meta didn’t ship a closed-model card at all.

The 31.5% figure is pre-safeguards, meaning it represents the raw model’s susceptibility before Anthropic’s defensive layers kick in. Every production deployment includes guardrails, monitoring, and filtering that reduce real-world exploit rates. But knowing the baseline vulnerability is exactly the kind of data that security architects need to build those guardrails correctly.

What Opus 4.8 actually does differently

False negatives on coding errors, where the model fails to catch its own mistakes, dropped from 19.7% to 3.7%. Opus 4.8 also introduces dynamic multi-agent orchestration at scale, coordinating hundreds of sub-agents simultaneously to manage large software projects.

Why crypto should pay attention

A 31.5% pre-safeguard hijack rate for browser-based agents should make anyone running AI systems in crypto pause. Browser agents are precisely the kind of tool that crypto projects deploy for monitoring dashboards, scraping on-chain data, interacting with DEX frontends, and executing trades through web interfaces.

Prompt injection in a browser agent means a malicious website, a compromised API response, or even a cleverly crafted token name could potentially redirect an AI agent’s behavior. In traditional software, that’s a data breach. In crypto, that’s a drained wallet.

Multi-agent orchestration adds another layer of complexity. When Opus 4.8 coordinates hundreds of sub-agents, a single successful prompt injection could potentially cascade across the entire workflow. In a crypto context, that’s the difference between one compromised transaction and a systemic failure across an entire automated trading operation.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

This article was originally published on Crypto Briefing and is republished here under RSS syndication for informational purposes. All rights and intellectual property remain with the original author. If you are the author and wish to have this article removed, please contact us at [email protected].

Anthropic reveals 31.5% hijack rate for Opus 4.8 browser agent before safeguards

Anthropic reveals 31.5% hijack rate for Opus 4.8 browser agent before safeguards

The transparency gap between labs

What Opus 4.8 actually does differently

Why crypto should pay attention

NexaPay — Accept Card Payments, Receive Crypto

Related Articles

Vertiv introduces converged physical infrastructure digital twin for Nvidia’s Omniverse DSX platform

Tether AI hires inference engineers to advance local AI projects

Florida sues OpenAI, Sam Altman over chatbot safety concerns

Tether AI open-sources TurboQuant, reducing LLM KV cache memory use by 5x

AI Giant Anthropic Files to Go Public After Nearing $1 Trillion Valuation

Elk Grove Village property seeks $850M in junk bonds for CoreWeave data center