Simply put
- Alibaba has revamped Qwen Deep Research, enabling one-click generation of web pages and podcasts.
- Our tests showed that Qwen and Gemini had equal accuracy, outperforming ChatGPT and Grok.
- Qwen excelled in research depth and shareable web output, while Gemini was better in multimedia quality.
Last week, Qwen, an AI research group from Alibaba, introduced a significant upgrade to its AI chatbot. Now, users can create detailed research documents on any topic easily.
These documents can be transformed into visually appealing web pages or multi-speaker podcasts with just a couple of clicks.
Qwen Chat resembles platforms like ChatGPT, DeepSeek, and Claude, offering its services for free globally.
Qwen Deep Research has received a major upgrade. ⚡️
Alongside reports, we can now create live web pages 🌐 and podcasts 🎙️ with Qwen3-Coder, Qwen-Image, and Qwen3-TTS.
Your insights are now both visual and audible. ✨
The new features utilize three open-source models working in sync. Qwen3-Coder manages the web structure, Qwen-Image generates graphics, and Qwen3-TTS delivers dynamic audio narration.
Even though the system uses open-source technology, the entire process, from research execution to web deployment and audio production, is managed by Qwen.
The process starts with Qwen Chat, where a user poses a question. The AI conducts a web search post-clarification, analyzes public data, and produces a detailed report complete with citations.
As a result, two new features appear: “Web Dev” that swiftly deploys and hosts professional-quality live web pages with inline graphics, and “Podcast” that provides audio conversations featuring multiple speakers, with a variety of voice options.
Testing the model
To evaluate how Qwen performs as a research tool, we tested it against Gemini, ChatGPT, and Grok using a complex research query. The goal was to analyze the philosophical and scientific debates regarding the existence of God. Each model produced a comprehensive report that we assessed based on five criteria: accuracy of claims and citations, information quality, clarity of explanation, intellectual depth, and overall quality.
TL;DR: Qwen Deep Research stands out for its depth of analysis, thorough citations, and unique auto-generated web pages, making it ideal for researchers. It serves as a useful, all-in-one free alternative. However, Gemini remains superior in audio and video quality, while ChatGPT and Grok are still quality options for casual use, but lack the range of Qwen.
Here’s a deeper look:
Accuracy: Did the reports accurately express philosophical and scientific claims?
Qwen performed exceptionally well, accurately referencing debates, notably citing Bertrand Russell’s “why am I not a Christian” and discussions between William Lane Craig and Peter Atkins. In contrast to other models, Qwen relied on credible academic sources, which helped in achieving accuracy.
Gemini maintained similar accuracy with numerous citations, although some appeared duplicated across the report.
Both models avoided errors, distinguishing between interpretations and recognizing popular theism accurately.
ChatGPT oversimplified some concepts, while Grok provided good summaries but lacked precise attribution.
Results: Qwen and Gemini were the most accurate.
Information provided: How thorough was the investigation?
Qwen uniquely included a segment titled “Atheism Criticism,” exploring arguments not addressed by others. This section delineated between types of atheism, citing relevant thinkers as well.
An example from Qwen illustrates this: “The burden of proof was famously illustrated by Bertrand Russell’s teapot analogy, suggesting that a theist can’t prove God’s existence, just as a teapot lost in space can’t be proven to exist.”
No other model delved into the burden of proof as thoroughly; Gemini closely followed with strong insights on consciousness. ChatGPT included practical discussions on ethical implications, while Grok kept things brief.
Results: Qwen provided the most depth.
Clarity: How clearly was the research conveyed?
Grok organized arguments using tables, which made it easy to follow. Meanwhile, ChatGPT used parentheses to clarify complex ideas, sometimes making it easier to digest.
Qwen and Gemini adopted a more formal academic style, which, while accurate, felt dense. They are tailored for in-depth research rather than casual reading.
Results: ChatGPT was the clearest, with Grok following closely.
Diversity of information sources: Did the research reflect varied perspectives?
Qwen combined philosophical arguments with contemporary scientific discussions, effectively explaining positions and providing background for its arguments.
Gemini, however, highlighted consciousness discussions effectively, while ChatGPT examined practical implications and Grok covered key points with less depth.
Results: Qwen and Gemini stood out for their coverage.
Quality: Which studies stand out based on rigor and scholarly value?
Both Qwen and Gemini produced reports that could earn a passing grade in an academic setting. Qwen excelled in balancing theistic and atheistic perspectives, while Gemini integrated scientific debates with philosophical arguments.
ChatGPT provides educational value but is more suited for general understanding. Grok serves as a quick reference.
Final scores:
- Kwen: 9/10
- Gemini: 9/10
- ChatGPT: 8/10
- Grok: 6/10
Podcast Battle: Kwen vs Gemini
Qwen’s podcast features compete against Google’s NotebookLM and Gemini, leaders in AI-generated audio summaries.
Unlike Gemini, Qwen offers a selection of host voices and features engaging AI discussions based on research rather than just reading text.
That said, the audio quality of Qwen can be inconsistent. Some voices are more natural, while others come across as robotic. In one instance, a host’s exaggerated reactions prompted confusion and concern from family members.
However, experimenting with different voices can elevate the audio experience.
In contrast, Gemini and NotebookLM’s human-like audio making provides smoother and more natural interactions, even incorporating humor.
Gemini’s podcasts are engaging and relatable.
Additionally, Gemini offers video generation, which is advantageous for those who favor audiovisual learning.
This feature is unavailable with Qwen, emphasizing Gemini’s superior multimedia capabilities.
Advantages of web pages
Beyond just research quality, Qwen’s auto-generated web pages are a standout feature, unlike any other model.
Once research is complete, users can instantly publish it as a live website—not mere static formats like PDFs. These pages come embellished with organized headers, tables, and citations as hyperlinks.
The interface bears a resemblance to familiar platforms, ensuring it’s visually appealing and shareable.
Users of ChatGPT will find themselves needing to transfer information manually to their website builders, while Gemini maintains its documents in a unified storage system. Only Qwen offers direct web-ready outputs.
This seamless integration of research output is quite beneficial.


