Researchllmcreativity evaluationpromptingtemperature control
Large Language Models Reach Average Human Creativity
10.0
Relevance Score
A large-scale study led by Professor Karim Jerbi and published in Scientific Reports on January 21, 2026 compared leading LLMs (including GPT-4, Claude, Gemini) with over 100,000 human participants on divergent creativity tasks. Researchers found some models now exceed average human scores on the Divergent Association Task and creative-writing tests, but the top 10% of human creators still outperform all tested models; creativity is also tunable via temperature and prompting.


