When The Safety Lab | schriber.com

INSIGHTS

When the Safety Lab

When the Safety Lab Leaks: What the Anthropic Incident Adds to the McKinsey Wake-Up Call

Silvan Schriber · April 2026

A few days back, I wrote about the McKinsey Lilli breach and what it means for bank boards. Then Anthropic — the company that built its brand on AI safety — leaked its own source code. Twice. In five days. The two incidents are structurally different, but they converge on the same uncomfortable question: if the most security-conscious organisations in the AI ecosystem can't protect their own systems, what does that mean for the rest of us?

A quick recap

In late February, security startup CodeWall pointed an autonomous AI agent at McKinsey's internal AI platform, Lilli. Within two hours — no credentials, no insider knowledge, $20 in compute — the agent had full read-write access to 46.5 million chat messages, 728,000 files, and the system prompts that governed what Lilli told 40,000 consultants. The vulnerability was SQL injection, a bug class that has been documented since 1998.

I wrote that the McKinsey breach was a warning to every bank board deploying internal AI tools. The concern was not just data exfiltration but silent corruption — the ability to poison an AI's advice at scale, without detection.

Then, during the last week, Anthropic provided a very different — but equally instructive — case study.

What happened at Anthropic

Two incidents, five days apart, both caused by human error.

On March 26, a misconfiguration in Anthropic's content management system made nearly 3,000 internal files publicly accessible. Among them was a draft blog post describing an unreleased model known internally as "Mythos" or "Capybara" . The document was sitting in an unsecured, publicly searchable data store.

On March 31, a routine update to Claude Code — Anthropic's AI-powered coding tool — was published to npm, the public registry that developers use to download software packages. But instead of uploading only the compiled production code, someone included a source map file that pointed to the complete original codebase. Approximately 500,000 lines of unobfuscated code across 1,900 files were exposed: the full client-side architecture, unreleased feature flags, internal model performance data, OAuth flows, permission logic, and telemetry systems.

Within hours, the code was downloaded from Anthropic's own cloud storage, mirrored to GitHub, and forked thousands of times. It is now being actively analysed — and, in some cases, weaponised through trojanised repositories — by developers, researchers, competitors, and threat actors worldwide.

Anthropic's response: "No sensitive customer data or credentials were involved or exposed. This was a release packaging issue caused by human error, not a security breach."

Technically, perhaps. Strategically, that framing misses the point.

Two failures, one pattern

At first glance, the McKinsey and Anthropic incidents look like different species. One was an external breach of an internal AI platform. The other was an internal operational failure at an AI vendor. One exposed client intelligence. The other exposed product architecture.

But underneath the surface differences, the pattern is the same — and it's the pattern that matters for boards.

Elementary failures, not exotic exploits. At McKinsey, the entry point was SQL injection and a set of API endpoints that required no authentication to access. At Anthropic, the entry point was a misconfigured content management system and a routine software release that included files never intended for publication. Neither incident required a sophisticated adversary, a zero-day vulnerability, or a nation-state budget. Standard automated tools failed to catch the McKinsey vulnerability. Standard release safeguards failed to prevent the Anthropic leak. In both cases, the organisation believed it was secure. The belief was wrong.

AI systems concentrate risk in new ways. Lilli wasn't just a chatbot — it was an intelligence repository containing the substance of McKinsey's advisory work across thousands of client engagements. Claude Code isn't just a coding tool — it has deep access to developers' codebases, infrastructure, and business logic in enterprises and government agencies. When these systems are compromised, the blast radius is qualitatively different from a traditional software incident. The McKinsey breach threatened to corrupt advisory judgment at scale. The Anthropic leak handed competitors and threat actors a detailed engineering blueprint for a tool embedded in national security operations.

Security culture didn't match security claims. McKinsey derives approximately 40% of its revenue from AI advisory work. It had tested Lilli's security extensively. Anthropic has built its entire corporate identity around responsible AI development. It pioneered Constitutional AI, publishes detailed safety research, and positions itself as the trusted alternative to less cautious competitors. Yet McKinsey left 22 endpoints unauthenticated in production for over two years, and Anthropic leaked its own source code twice in five days. The gap between security posture and security performance is where the real risk lives.

But the Anthropic case adds a new dimension

The McKinsey breach was fundamentally a first-party risk event. McKinsey built Lilli, McKinsey deployed it, McKinsey's clients were exposed. The governance question was: is your own house in order?

The Anthropic incidents introduce a different and, for regulated institutions, arguably more consequential risk category: third-party AI vendor failure.

Every bank, wealth manager, and asset manager that uses Claude — or any AI model from a major provider — is now a downstream stakeholder in that provider's operational security. When Anthropic leaks its own source code, the implications cascade outward: threat actors gain detailed knowledge of the tool's architecture, enabling more targeted attacks against organisations that use it. Malicious forks and trojanised repositories create supply-chain risks for any developer who interacts with the leaked code. The exposed feature roadmap and internal performance data give adversaries a map of capabilities and limitations they can exploit.

A US congressman has already written to Anthropic's CEO raising national security concerns. That letter observed that Claude is embedded in defence and intelligence operations, making Anthropic's operational security a matter of public interest — not just commercial risk.

For bank boards, the lesson is not that Anthropic is uniquely unreliable. The lesson is that no AI vendor is exempt from operational failure, and the current frameworks for managing vendor risk were not designed for this class of dependency.

What this changes for boards

In my earlier article, I proposed a six-point action plan for bank boards responding to the McKinsey breach. The Anthropic incidents sharpen and extend that framework in three specific ways.

First, AI vendor risk assessment may need its own category. Traditional vendor due diligence evaluates financial stability, data handling practices, business continuity, and compliance certifications. These remain necessary but are no longer sufficient. AI vendors introduce risks that are specific to their nature: unreleased model capabilities with national security implications, distribution through public package registries that create supply-chain exposure, and the unique dependency that arises when an AI tool operates inside your codebase and your data environment. Boards should require that every AI vendor relationship is assessed against an AI-specific risk framework — not just the standard outsourcing checklist.

Second, "human error" is not a mitigating factor — it's the risk. Anthropic's framing of both incidents as "human error, not a security breach" is revealing. For a board, the distinction is irrelevant. The question is not whether the failure was intentional or accidental. The question is whether the vendor's processes are designed to prevent elementary errors from cascading into material exposures. At McKinsey, the process failed to enforce authentication on production endpoints. At Anthropic, the process failed to prevent source code from being published to a public registry. In both cases, the controls that should have caught the error before it reached production did not exist or did not function. Boards should ask their AI vendors: what are your release controls, who reviews them, and when did you last test whether they work?

Third, incident response plans must cover AI vendor compromise. Most bank incident response playbooks address scenarios where the bank's own systems are breached or where a traditional IT vendor suffers a data loss. They do not address the scenario where an AI vendor leaks its own architecture, creating downstream supply-chain risks and enabling more sophisticated attacks against the bank's AI-dependent workflows. The Anthropic case makes this scenario real. Boards may require a tabletop exercise that specifically tests: how would we detect that a tool we depend on has been compromised at its source? What is our exposure if the vendor's codebase, model logic, or configuration data is in adversarial hands? And how quickly can we switch to an alternative?

The deeper question

Taken together, the McKinsey and Anthropic incidents point to something more fundamental than a pair of security lapses. They point to a structural mismatch between the speed at which AI systems are being deployed and the maturity of the controls surrounding them.

McKinsey deployed Lilli at scale across 40,000 consultants. Anthropic distributes Claude Code through a public package registry to developers worldwide, including those working on classified government systems. In both cases, the technology moved faster than the operational discipline required to secure it.

This is not a criticism specific to McKinsey or Anthropic. Both organisations responded quickly once the failures were identified. But speed of response is not the same as adequacy of prevention. And for boards — whose responsibility is foresight, not reaction — the question is whether their institutions are building AI governance frameworks that match the pace and nature of AI deployment.

The answer, for most, is not yet. But the case studies are no longer hypothetical.

Silvan Schriber is Managing Director at Alvarez & Marsal and a Board Member and Audit & Risk Committee Chair at Zuger Kantonalbank. He advises financial institutions on strategy, transformation, and governance — including ICT risk and cyber resilience.

Back to Insights