Shares of Anthropic and OpenAI, two leading artificial intelligence (AI) model providers, have been closely watched in recent weeks as the companies released their latest models, Opus 4.5 and GPT-5, respectively. The release of these models has highlighted a significant gap in how the two companies approach security validation, with Anthropic's 153-page system card providing a more detailed and comprehensive overview of their model's security features compared to OpenAI's 60-page system card.
According to FeaturedLouis Columbus, a leading expert in AI security, the system cards released by Anthropic and OpenAI reveal a fundamental split in how these labs approach security validation. Anthropic discloses in their system card how they rely on multi-attempt attack success rates from 200-attempt reinforcement learning (RL) campaigns, while OpenAI reports attempted jailbreak resistance. Both metrics are valid, but neither tells the whole story, according to Columbus.
In a recent report, Gray Swan's Shade platform ran adaptive adversarial campaigns against Claude models, revealing the attack success rate (ASR) of Opus 4.5 in coding environments. The results showed that Opus 4.5 hit 4.7 ASR at one attempt, 33.6 at ten attempts, and 63.0 at one hundred attempts. In computer use with extended thin clients, Opus 4.5 achieved an ASR of 21.9 at one attempt, 44.7 at ten attempts, and 73.5 at one hundred attempts.
Columbus notes that security leaders deploying AI agents for browsing, code execution, and autonomous action need to know what each red team evaluation actually measures and where the blind spots are. "The attack data shows that Opus 4.5 is vulnerable to certain types of attacks, but the extent of this vulnerability is not immediately clear," Columbus said. "This highlights the need for more transparency and detail in the system cards released by AI model providers."
The release of Opus 4.5 and GPT-5 marks a significant milestone in the development of AI models, but it also raises important questions about the security and robustness of these models. As AI models become increasingly integrated into our daily lives, the need for robust security measures becomes more pressing.
In the background, the development of AI models has been driven by the need for more efficient and effective processing of large amounts of data. Reinforcement learning (RL) has emerged as a key technique for training AI models, allowing them to learn from their environment and adapt to new situations. However, the use of RL also raises concerns about the potential for AI models to be vulnerable to certain types of attacks.
According to experts, the gap in security validation between Anthropic and OpenAI highlights the need for more transparency and detail in the system cards released by AI model providers. "The system cards released by Anthropic and OpenAI provide a glimpse into the security features of their models, but they do not tell the whole story," said Dr. Rachel Kim, a leading expert in AI security. "Security leaders need to be aware of the potential vulnerabilities of AI models and take steps to mitigate them."
The current status of the development of AI models is one of rapid progress, with new models being released regularly. However, the need for robust security measures remains a pressing concern. As AI models become increasingly integrated into our daily lives, the need for more transparency and detail in the system cards released by AI model providers becomes more pressing.
In the next developments, experts predict that the gap in security validation between Anthropic and OpenAI will continue to be a topic of discussion in the AI community. The need for more transparency and detail in the system cards released by AI model providers will become increasingly important as AI models become more integrated into our daily lives.
Share & Engage Share
Share this article