2 points | by gyanveda 12 hours ago ago
1 comments
We tested 20 of the most popular LLMs against 10 real-world risks, including:
- Privacy & Impersonation
- Unqualified Professional Advice
- Child & Animal Abuse
- Misinformation
What we found:
- Anthropic's Claude Haiku 3.5 was the safest, scoring 86% (others dropped as low as 52%)
- Privacy & Impersonation were the top failure points, with some models failing 91% of the time
- Most models performed best on misinformation, hate speech, and malicious use
- No model is 100% safe, but Anthropic, OpenAI, Amazon, and Google consistently outperform peers
We built this matrix (and dev tools to build your own) to help teams measure AI risk more easily.
We tested 20 of the most popular LLMs against 10 real-world risks, including:
- Privacy & Impersonation
- Unqualified Professional Advice
- Child & Animal Abuse
- Misinformation
What we found:
- Anthropic's Claude Haiku 3.5 was the safest, scoring 86% (others dropped as low as 52%)
- Privacy & Impersonation were the top failure points, with some models failing 91% of the time
- Most models performed best on misinformation, hate speech, and malicious use
- No model is 100% safe, but Anthropic, OpenAI, Amazon, and Google consistently outperform peers
We built this matrix (and dev tools to build your own) to help teams measure AI risk more easily.