Chatbots Engage, But Don’t Deliver

Chatbots Engage, But Don’t Deliver calls attention to a widening gap in artificial intelligence development: systems built to attract attention rather than solve problems. As criticism mounts from tech leaders like Kevin Systrom, Elon Musk, and Geoffrey Hinton, concerns are growing that the AI attention economy is misguiding users and reducing long-term trust. Despite their interactive appeal, many generative AI chatbots foster engagement metrics that favor time spent over true utility prompting a re-evaluation of what meaningful human-AI interaction should look like in high-stakes settings like education, work, and journalism.

Key Takeaways

Kevin Systrom argues most AI chatbots offer zero practical utility, despite high user engagement.
Engagement metrics like time-on-platform often overshadow real productivity or problem-solving outcomes.
Leading experts warn that entertainment-focused AI may misinform users and erode trust in AI systems.
Differentiating between engagement-first and utility-first AI design is critical for ethical AI development.

Also Read: AI’s Hidden Impact on Homeowner Costs

Chatbots Engage, But Don’t Deliver
Key Takeaways
The Engagement Trap: AI as Entertainment
Kevin Systrom’s Case for Utility-First AI
Expert Warnings: Trust and Misinformation
Engagement vs. Utility: A Side-by-Side Comparison
The Business Incentive Dilemma
Can Chatbots Be Both Engaging and Useful?
Practical Tips: How to Spot Utility-Driven AI
Conclusion: Reframing AI Benchmarks for the Future
References

The Engagement Trap: AI as Entertainment

The proliferation of generative AI chatbots led by products like ChatGPT and Bard has captivated public interest. With natural-sounding dialogue and wide general knowledge, they give the appearance of intelligence. But this design is largely optimized for one goal: keeping users engaged.

Engagement in this context is quantified by metrics such as:

Session length
Interaction depth (number of messages exchanged)
User return rate
Click-through rates on AI-generated suggestions

This metric-driven design heavily mimics social media platforms, where higher engagement fuels ad revenue and brand stickiness. Kevin Systrom, co-founder of Instagram and now CEO of Artifact, labels this approach fundamentally flawed for information tools. “The utility of these chatbots is zero,” he states, suggesting users may be entertained but walk away misinformed or unproductive.

Also Read: The Future of Chatbot Development: Trends to Watch

Kevin Systrom’s Case for Utility-First AI

Artifact, a news recommendation app rooted in AI, served as Systrom’s response to what he saw as the misuse of AI’s potential. Rather than optimizing for clickbait or novelty, Artifact filtered high-quality journalism using ML algorithms aimed at accuracy and relevance. This approach, while receiving positive feedback from users valuing curation over conversation, stood in sharp contrast to the viral success of generative chatbots.

Systrom’s sharp critique joins a broader call among technologists to reprioritize AI design. In his view, real utility the ability to answer questions accurately, synthesize source-based content, and support user goals should define success, not addictive dialogue loops.

Expert Warnings: Trust and Misinformation

Concerns about chatbot utility are not new. Geoffrey Hinton, often called the “godfather of AI,” left Google in 2023 amid fears that generative AI would amplify misinformation. Chapman University’s 2023 public trust survey found that 45% of respondents trusted chatbots less than search engines, citing factual errors and vague responses as major concerns.

Elon Musk similarly warned that engagement-focused AI models may “manipulate users” or “reinforce harmful behaviors.” Both Musk and Hinton argue that conversational believability should not be confused with factual accuracy. When chatbots “hallucinate” fabricate answers in plausible language they risk misleading even informed users.

This creates a dangerous feedback loop: the more users engage with AI for entertainment, the more these models are algorithmically rewarded for speculative or exaggerated responses. Trust, once eroded, is difficult to rebuild.

Also Read: How Can We Make Chatbots Intelligent?

Engagement vs. Utility: A Side-by-Side Comparison

To highlight the practical differences between engagement-driven and utility-first AI, consider these two chatbot experiences:

Feature	Engagement-First Chatbot (e.g., ChatGPT-3.5)	Utility-First Chatbot (e.g., GitHub Copilot, Perplexity AI)
Response Style	Conversational, often verbose	Concise, task-specific
Accuracy Verification	Limited or no citation of sources	Sources cited; verifiable references
User Goal Alignment	Optimized to keep chatting	Optimized to complete the task
Learning Outcome	Variable and anecdotal	Structured, knowledge-based

This contrast highlights that while traditional chatbots may impress in casual conversation, they often fall short when applied to domains requiring precision, such as legal research, coding, or financial analysis.

The Business Incentive Dilemma

Why do major tech companies continue building engagement-first chatbots? The answer lies in monetization. AI models integrated with advertising ecosystems benefit directly from prolonged user interaction. Microsoft’s use of generative AI in Bing, for example, increased query sessions per user but this also created new ad inventory for partners.

In this landscape, true utility becomes a secondary concern. Solving a user’s problem quickly might actually reduce engagement time, meaning lower revenue. This misalignment of incentives explains why companies like Artifact designed to prioritize user success outcomes remain the exception, not the rule.

Can Chatbots Be Both Engaging and Useful?

There is emerging research and product innovation attempting to bridge the divide. A 2024 Stanford HCI study analyzed user satisfaction across 100,000 chatbot-driven tasks. The findings showed hybrid models offering both cited information and conversational UX yielded 28% higher task success rates than purely large language model-based chatbots.

Notably, tools like Perplexity AI, which enable on-demand citations and document uploads, are gaining traction among researchers and students for exactly this reason. They demonstrate that AI systems need not sacrifice engagement for utility but doing both well requires careful design, transparent data sourcing, and aligned business models.

Practical Tips: How to Spot Utility-Driven AI

For professionals, educators, and consumers alike, recognizing genuinely useful AI tools is crucial. Here are some characteristics to evaluate:

Source citation: Does the chatbot provide links or references for its claims?
Task alignment: Is the output aligned with your actual goal (e.g., solving a problem, completing work)?
Reproducibility: Can the information or solution be followed, tested, or verified?
Distraction level: Does the chatbot offer entertainment tangents or stay focused?

Selecting AI that prioritizes attentiveness to user success rather than screen time can improve productivity and reduce the risk of being manipulated by reward-driven machine patterns.

Conclusion: Reframing AI Benchmarks for the Future

The current state of AI chatbot development reveals a skewed value system. When success is measured by user engagement rather than utility, even impressive systems can become distractions instead of tools. As Kevin Systrom and other leaders echo, it’s time to shift toward models that help users do more, not just stay longer. This pivot requires reengineering incentives, rethinking benchmarks, and above all, placing user outcomes at the center of AI design.