Anthropic, maker of the Claude family of large language models, this week updated its policy for safety controls over its software to reflect what it says is the potential for malicious actors to exploit the AI models to automate cyber attacks.
The PDF document, detailing the company’s “responsible scaling policy,” outlines several procedural changes that it says are needed to monitor the ongoing risks of misuse of AI models. That includes several levels of escalating risk, known as AI Safety Level Standards (ASL) defined as “technical and operational safeguards.”
Also: Gmail users, beware of new AI scam that looks very authentic
As part of the company’s “routine testing” of AI models for safety — known as a “capability assessment” — Anthropic reports that it has uncovered a capability that “requires significant investigation and may require stronger safeguards.”
That capability is described as a threat within cyber operations: “The ability to significantly enhance or automate sophisticated destructive cyber attacks, including but not limited to discovering novel zero-day exploit chains, developing complex malware, or orchestrating extensive hard-to-detect network intrusions.”
The report describes measures that will be undertaken to look into the matter on an ongoing basis:
“This will involve engaging with experts in cyber operations to assess the potential for frontier models to both enhance and mitigate cyber threats, and considering the implementation of tiered access controls or phased deployments for models with advanced cyber capabilities. We will conduct either pre- or post-deployment testing, including specialized evaluations. We will document any salient results alongside our Capability Reports.”
Currently, all of Anthropic’s AI models, it says, must meet ASL “level 2” requirements. That level “requires a security system that can likely thwart most opportunistic attackers and includes vendor and supplier security reviews, physical security measures, and the use of secure-by-design principles,” the report states.
The updated policies can be seen as part of an effort by both Anthropic and OpenAI to voluntarily promise curbs on artificial intelligence amidst the ongoing debate over what should or should not be done to regulate AI technologies. In August, the company and OpenAI reached agreements with the US Artificial Intelligence Safety Institute at the US Department of Commerce’s National Institute of Standards and Technology (NIST) to collaborate on research, testing, and evaluation of AI.
Also: Think AI can solve all your business problems? Apple’s new study shows otherwise
The idea of AI automating cyber attacks has been in circulation for some time. Firewall vendor Check Point Software Technologies warned last year that state-based actors from Russia were trying to compromise OpenAI’s ChatGPT in order to automate phishing attacks.
End-point security software vendor CrowdStrike this summer reported that generative AI is vulnerable to a vast array of specially crafted prompts that can break the programs’ guardrails.