Researchers discovered that AI chatbots like ChatGPT can moderately identify metaphors in political speech but often struggle with the nuanced language used in Donald Trump’s addresses.
A recent study published in Frontiers of Psychology assesses the strengths and weaknesses of AI systems in grasping metaphors within political discourse. Researchers specifically tested ChatGPT-4, a sophisticated language model designed to comprehend and generate human-like language, using speeches by President Donald Trump.
The study examined four of Trump’s speeches from mid-2024 to early 2025, accumulating over 28,000 words. These selections showcased Trump’s unique style, which employs metaphors to communicate political themes effectively.
Using a technique known as key metaphor analysis, researchers prompted ChatGPT-4 to pinpoint metaphors, categorize them, and interpret their emotional or ideological connotations. It managed to identify metaphors with about 86% accuracy, correctly analyzing 119 out of 138 sampled sentences.
However, in-depth analysis exposed recurring issues with the model’s understanding. ChatGPT-4 often mistook other expressions, like direct phrases, for metaphors and tended to overanalyze simple statements. It struggled to accurately classify names and specific terms, viewing them as minor metaphors instead of proper nouns.
These errors underline the challenges AI faces in grappling with contextual meaning. Unlike humans, these models lack personal experiences, cultural insights, and the emotional nuances that typically inform human understanding—particularly evident in political conversations.
The study also evaluated ChatGPT-4’s capacity to categorize metaphors based on common themes and “source domains.” While effective with popular metaphor types, it faltered with less familiar or more abstract categories like cooking and food.
When researchers compared results from AI to standard metaphor analysis tools, they found that ChatGPT was faster and more user-friendly but less consistent in identifying metaphors across diverse categories. They noted that the model’s performance is highly influenced by prompt wording, with even minor changes affecting the results.
Moreover, the study highlights a broader issue with the training methods of LLMs. These models depend on vast datasets mined from the Internet, which may not adequately represent specific languages or cultural contexts. Consequently, LLMs might inadvertently replicate existing biases related to gender, race, or ideological perspectives.
In conclusion, the researchers assert that large-scale language models like ChatGPT show promise in analyzing metaphors but are not a replacement for human insight. Their propensity for misinterpretation and errors in nuanced situations suggests they are better suited to support human researchers rather than replace them.
Please read more PSYPOST is here.

