This month I want to highlight a blog post, Disrupting the first reported AI-orchestrated cyber espionage campaign, by Anthropic. Anthropic details a cybersecurity attack – orchestrated using Claude – which discovered and attacked security vulnerabilities at incredibly high speeds with minimal human interaction.
Using AI to increase attack effectiveness is expected; it has been predicted for years and no one ever reasonably doubted this outcome. As Anthropic covers in the blog, the same mechanisms that enable the attack also enable us to protect against such attacks, such as rapidly searching for vulnerabilities. Thus, it’s not clear that we should simply work to disable the capability, if that were even possible.
My thinking is that this new AI-enabled attack methodology is both novel and recurrent. Every new piece of technology creates new vulnerable surfaces, so it is unsurprising to learn about this AI-enabled attack. But, there are also lessons to learn for Anthropic and the broader community. This attack serves as a timely reminder that no technology is ever safe from bad actors, and we need to remain creative in our thinking so as not to get complacent.
https://www.anthropic.com/news/disrupting-AI-espionage
“At this point they had to convince Claude—which is extensively trained to avoid harmful behaviors—to engage in the attack. They did so by jailbreaking it, effectively tricking it to bypass its guardrails. They broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose. They also told Claude that it was an employee of a legitimate cybersecurity firm, and was being used in defensive testing.”
Leave a comment