Researchllmcontent moderationanthropicmodel safety

ADL Rates LLMs On Antisemitic Moderation

theverge.com

|January 28, 2026

8.2

Relevance Score

ADL Rates LLMs On Antisemitic Moderation

The Anti-Defamation League published a study Wednesday evaluating six large language models — Anthropic Claude, OpenAI ChatGPT, Meta Llama, Google Gemini, DeepSeek, and xAI Grok — on handling anti-Jewish, anti-Zionist, and extremist prompts across 4,181 chats per model (over 25,000 chats) between August and October 2025. Claude scored highest (80) while Grok scored lowest (21), revealing substantial moderation gaps and multimodal weaknesses, especially in image and document analysis.

ADL Rates LLMs On Antisemitic Moderation

More AI & Data Science News

Andhra Pradesh Establishes AI University With NVIDIA

Gen Z Embraces Generative AI For Work

Doomsday Clock Moves Closer To Midnight

Scoring Rationale

Sources

Global Blockchain Congress Returns To Dubai With Investment Focus