Simply put
- Recent studies indicate that large language models (LLMs) can accurately mimic consumer purchase intent by correlating free text responses with Likert ratings using semantic similarities.
- This method demonstrated a remarkable 90% test-retest reliability based on 9,300 actual survey responses.
- The findings highlight potential issues regarding bias, generalizability, and whether “synthetic consumers” can truly represent real individuals.
Forget traditional focus groups—new research shows that LLMs have the ability to predict consumer purchasing desires with exceptional precision, much better than conventional marketing methods.
Researchers from the University of Mannheim and ETH Zurich found that LLMs can effectively simulate human buying intentions. They turned open-ended text responses into structured survey data, which marketers find quite useful.
In a recent paper, the team presented a new technique called “Semantic similarity evaluation.” This method translates the LLM’s open-ended responses into numerical values using a Likert-style scale, which is widely recognized in consumer research.
Instead of asking the models to pick a number from 1 to 5, researchers provided them with natural responses, such as “I’ll definitely buy this” or “Maybe when it goes on sale,” then assessed how closely these aligned with standard responses like, “I’ll definitely buy this” or “I won’t buy this.”
Each response is matched to the closest standard statement within the embedding space, effectively converting the LLM’s output into a statistical analysis. The authors noted that prioritizing semantic similarity over numerical values resulted in purchase intention distributions that closely matched real human survey data, with LLM-generated responses maintaining 90% reliability across repeated human tests while reflecting natural variations in opinions.
In trials involving 9,300 real human survey responses concerning personal care products, the SSR method created synthetic participants whose Likert score distributions closely paralleled the original data. Essentially, these models seemed to effectively “think like consumers.”
Why is it important
This discovery could shift how companies approach product testing and market research. Traditional consumer research is often costly, time-consuming, and biased. If synthetic respondents can effectively imitate real ones, businesses might be able to evaluate thousands of products and messages at a significantly reduced cost.
Furthermore, the geometric structure of the LLM’s semantic space is not just relevant for understanding language but also for attitudinal reasoning. The study illustrates how comparing results within an embedded space can substitute for human judgment with surprising accuracy.
However, this also raises ethical and methodological concerns. The research focused solely on one product category, leaving uncertainties about its applicability to areas like financial decisions or political matters. Moreover, these synthetic “consumers” might just morph into synthetic targets, where similar modeling techniques could assist in political persuasion, advertising, or behavior nudges.
As the authors suggest, “market-driven optimization pressures can systematically undermine alignment,” a concern that stretches beyond just marketing.
Skeptical note
The researchers admit that their emphasis on personal care products may not generalize effectively to more impactful or emotionally charged purchases. The SSR mapping also depends on the careful selection of reference statements; minor wording changes could skew outcomes. Additionally, the study utilizes human survey data as a “truth” source, despite it being often noisy and culturally biased.
Critics highlight that the assumption of embedding-based similarity mapping neatly onto human attitudes might falter with complex contexts or sarcasm involved. While the reported 90% human test-retest consistency seems strong, it still leaves considerable room for variability. In other words, this method might work well on average, but it remains uncertain whether those averages genuinely capture the diversity of human thoughts or merely mirror the model’s training biases.
Big picture
Interest in “integrated consumer modeling” is burgeoning as companies experiment with AI-driven focus groups and predictive polling. Similar investigations by MIT and the University of Cambridge have indicated that LLMs can reliably simulate demographic and psychographic segments, yet until this study, none had shown a close statistical alignment with actual purchase intention data.
For now, the SSR method is still a research prototype, but it suggests a future where LLMs could not only respond to inquiries but also embody the broader public sentiment.
The ongoing debate—whether this represents progress or merely an illusion—remains open.





