DeepSeek AI’s Safety Lapses Surprise Experts
With the introduction of ChatGPT by OpenAI in 2022, the search for vulnerabilities in large language models (LLMs) began in earnest. Hackers and security experts have relentlessly tried to outsmart these systems into producing dangerous content ranging from hate speech to bomb-making instructions. In response, companies like OpenAI have enhanced their models’ defenses. However, DeepSeek, a rising Chinese AI platform with its affordable R1 model, has not kept pace with competitors in terms of security.
DeepSeek’s Security Lapses
Researchers from Cisco and the University of Pennsylvania unveiled that DeepSeek’s R1 model failed in every test when exposed to 50 harmful prompts, highlighting a concerning 100 percent success rate for attackers. These findings contribute to mounting evidence that DeepSeek lacks the safety standards of other LLM developers.
Censorship Easy to Bypass
DeepSeek’s system, which attempts to block topics considered sensitive under Chinese government regulations, proved ineffectual against circumvention efforts. This indicates a gap in platform security investment, as pointed out by DJ Sampath, VP of AI products at Cisco.
Jailbreaking Challenges
Jailbreaks, one type of prompt-injection attack, are designed to bypass the safety protocols of an LLM, yet all AI models remain obliged to protect against disinformation, among other threats. Initial jailbreaks were rudimentary, but as AI systems have advanced, so too have the methods, evolving into more obfuscated techniques often AI-assisted in development, which maintains their continued threat.
Security firm Adversa AI further emphasized the persistence of jailbreaks, equating their permanence to enduring software vulnerabilities like buffer overflow or SQL injection flaws.
Testing Methods and Comparison
Using prompts from HarmBench’s standardized library, Cisco tested 50 prompts across various harm categories on the local versions of DeepSeek’s model. However, DeepSeek remains vulnerable to sophisticated and customized linguistics or code execution tests.
When compared with other models such as Meta’s Llama 3.1, the findings showed similar vulnerabilities. Yet, Sampath from Cisco stresses a direct comparison with OpenAI’s reasoning model, which outperformed all rivals, including DeepSeek.
A Call for Continuous Evolution
Adversa AI’s findings indicate that DeepSeek does block known jailbreak attempts but stresses the need for ongoing red-teaming as missing it could lead to system breaches—suggesting industries must stay vigilant in evolving their AI defenses.