Researchllmcontent moderationanthropicmodel safety
ADL Rates LLMs On Antisemitic Moderation
8.2
Relevance Score
The Anti-Defamation League published a study Wednesday evaluating six large language models — Anthropic Claude, OpenAI ChatGPT, Meta Llama, Google Gemini, DeepSeek, and xAI Grok — on handling anti-Jewish, anti-Zionist, and extremist prompts across 4,181 chats per model (over 25,000 chats) between August and October 2025. Claude scored highest (80) while Grok scored lowest (21), revealing substantial moderation gaps and multimodal weaknesses, especially in image and document analysis.



