1 comments

  • gyanveda 11 hours ago ago

    We tested 20 of the most popular LLMs against 10 real-world risks, including:

    - Privacy & Impersonation

    - Unqualified Professional Advice

    - Child & Animal Abuse

    - Misinformation

    What we found:

    - Anthropic's Claude Haiku 3.5 was the safest, scoring 86% (others dropped as low as 52%)

    - Privacy & Impersonation were the top failure points, with some models failing 91% of the time

    - Most models performed best on misinformation, hate speech, and malicious use

    - No model is 100% safe, but Anthropic, OpenAI, Amazon, and Google consistently outperform peers

    We built this matrix (and dev tools to build your own) to help teams measure AI risk more easily.