DeepSeek’s AI Fails Security Tests Across the Board

Since OpenAI’s introduction of ChatGPT in late 2022, hackers and security experts have endeavored to unearth vulnerabilities in large language models (LLMs), hoping to prompt them to release harmful content like hate speech and bomb-making guidance. In answer to these challenges, developers of generative AI, including OpenAI, have reinforced their system defenses. Contrarily, DeepSeek’s new, economical R1 reasoning model, swiftly gaining attention, lags in safety protections compared to its industry counterparts.
Complete Test Failure
Security experts from Cisco and the University of Pennsylvania have revealed findings indicating that when faced with 50 harmful prompts designed to provoke toxic responses, DeepSeek’s model failed to recognize or deter even one. Experts labeled this a “100 percent attack success rate.” This raises substantial concerns regarding DeepSeek’s preparedness to match competitors in safety and security measures against attacks.
Exceeded Vulnerability Expectations
Additional findings from Adversa AI, a firm specializing in AI security, reinforce DeepSeek’s extensive exposure to various jailbreak techniques, ranging from straightforward language manipulations to advanced AI-generated prompts. Despite media interest, DeepSeek has refrained from publicly addressing these safety concerns.
The Risks of Generative AI
Generative AI models, akin to any technology, are vulnerable to deficiencies which, if leveraged or inadequately managed, could enable harmful usage. One significant vulnerability in today’s AI systems is indirect prompt injection, where an AI analyzes external data and takes unwarranted actions prompted by this information.
Jailbreaking, a variety of prompt-injection, circumvents safety protocols restricting LLM outputs. Companies aim to prevent misuse, like guide creation for dangerous substances or creating misinformation. However, mitigation of jailbreak techniques is challenging, akin to enduring security threats in software development, as noted by Alex Polyakov, CEO of Adversa AI.
Cisco’s Research Insights
CISCO’s focus on DeepSeek’s R1 involved thorough testing through HarmBench, consisting of prompts typically used to evaluate AI vulnerabilities in multiple areas, including cybercrime and misinformation. Comparisons against other models, such as Meta’s Llama 3.1, demonstrate DeepSeek’s R1’s significant shortcomings, unlike the robust OpenAI’s o1 reasoning model.
Broader Implications
Polyakov remarks on DeepSeek’s reliance on defensive responses, seemingly influenced by OpenAI datasets, yet comprehensive tests suggest DeepSeek’s systems are surprisingly easy to bypass. This alarmingly includes commonly known, long-standing jailbreak methods. Polyakov strongly advocates continuous testing and red-teaming as crucial to maintaining AI security.