
Our AI Agent Outages, Hallucinations, and Upsell Traps | The Agents Ep. 1
AI Summary
This podcast episode introduces "Agents with Amelia Larouch," a new series from Saster aiming to share their journey with AI agents, including challenges and learnings. The core meta-takeaway is that while building AI-powered "vibe-coded" apps is becoming increasingly accessible, even for non-technical individuals, maintaining these applications is a significant, often overlooked, challenge.
The speakers highlight that many people underestimate the maintenance required, or they delegate it to individuals who lack the technical intuition to troubleshoot complex issues. They emphasize that building an app is just the beginning of a continuous maintenance cycle, much like closing a sale is the start of a customer journey.
Several real-world issues encountered by the Saster team illustrate this point:
**1. Database Issues and Maintenance Responsibility:**
A preview instance of their applications experienced database connection problems, rendering it unavailable for hours. This incident highlighted the ambiguity of who to turn to for support. Unlike commercial software (like Salesforce or Squarespace) where there's a clear vendor to contact, with custom-vibe-coded apps, the responsibility falls internally. The agent itself couldn't resolve the issue, and it became clear that a dedicated person or team is needed to manage these problems, especially for preview and staging environments where development and iteration occur. This contrasts with production environments, which remained unaffected. The speakers stressed that without someone actively monitoring and maintaining these systems, issues could go unnoticed for days, leading to potential downtime and operational problems.
**2. Hallucinations and Data Drift:**
Despite advancements in AI models, hallucinations (agents making up information) remain a concern. The team shared examples:
* **AI VP of Marketing (10K):** This agent, designed to analyze years of revenue data, began misinterpreting years for year-over-year comparisons, leading to wildly different performance figures within the same day. It even "made up" data when it couldn't find the correct year.
* **AI Chatbot:** A chatbot built for internal use also started hallucinating when it lacked complete or updated answers, demonstrating that even seemingly simple interactions can be prone to errors.
These "micro-hallucinations" require daily attention and maintenance, consuming about 15 minutes of a human's time. Without this oversight, agents can gradually drift from reality and accurate data. The speakers caution against letting agents run autonomously without close monitoring, as they can go "off the rails" if not properly supervised.
**3. Model-Based Regressions:**
A significant challenge emerged with their pitch deck analyzer, a complex tool that uses multiple passes through a language model. After model updates (even minor ones), the analyzer started producing anomalous results, with many startups being incorrectly assigned $100,000 in revenue and 500% growth. This occurred without any changes to the application's code, indicating that subtle shifts in the underlying AI models could introduce regressions. This highlights the need for continuous monitoring and adaptation when relying on external AI models, as their behavior can change unexpectedly.
**4. Clay Agent's Misinformation and Pricing Complexity:**
An incident involving the Clay platform's AI agent, "Sculptor," demonstrated how agents can provide incorrect or misleading information, particularly concerning pricing. The agent initially recommended a significantly more expensive enrichment process and then suggested upgrading to a new plan to cover insufficient credits, even though cheaper alternatives existed. This was exacerbated by Clay's recent pricing changes, which the agent had not been properly trained on. The speakers noted that this situation could be intentional to upsell customers or a result of inadequate training data, emphasizing the importance of keeping AI agents updated with the latest product information. They also pointed out that complex pricing often masks hidden increases.
**5. HubSpot Agent's Ineffectiveness:**
The team experienced difficulty getting intelligent answers from HubSpot's website agent when inquiring about pricing, suggesting that some customer-facing AI agents are not adequately trained or are designed to push users towards human interaction.
**6. The "No Lead Left Behind" Philosophy:**
A key insight from a meeting with a public company CEO was the realization that the success of their AI agents in go-to-market functions stems from a "no lead left behind" approach. Agents ensure that every prospect and customer receives timely and appropriate interaction, covering areas that humans might miss due to bandwidth constraints or a preference for handling more "exciting" leads. This includes answering website questions instantly, setting appointments, and even engaging with prospects who might not initially meet strict budget criteria. The speakers argued that by touching every lead and prospect, regardless of their perceived value, organizations can significantly improve their overall engagement and conversion rates.
**7. Salesforce's Integration of Qualified:**
Salesforce's integration of Qualified onto its website signifies a strategic move to offer a simpler, more accessible go-to-market agentic tool to its vast customer base. While Agent Force offers deeper functionality, Qualified provides a more user-friendly experience for GTM-specific tasks, making it easier for customers to deploy agentic solutions. The quick rollout of Qualified on Salesforce's homepage after the acquisition highlights the speed at which these tools can be integrated.
**8. AI Team Members: QB and 10K:**
The episode also touched on the team's internal AI agents, QB (AI VP of Customer Success) and 10K (AI VP of Marketing).
* **Salesforce Integration for 10K:** Integrating 10K with Salesforce proved more complex than anticipated. The native integration's token expired daily, requiring the creation of a custom Salesforce object to establish a yearly token refresh. This process, guided by Claude and Co-work, took about half an hour but highlighted that even seemingly simple integrations can require technical workarounds, especially for complex platforms like Salesforce.
* **Localization of QB:** Replet was used to localize QB into Spanish and Chinese within 20 minutes. This was a significant achievement, as manual localization would have been much more time-consuming and expensive. While the agent initially translated only menus, further prompting led to a more comprehensive translation. This demonstrates the power of AI for rapid localization, a capability that even large companies like Shopify recently rolled out.
* **QB's Role in Deadline Management:** QB played a crucial role in enforcing deadlines for sponsor graphics for the SAS Annual event. It identified and flagged incomplete or placeholder submissions, preventing issues that might have been overlooked by humans. QB's neutral, objective approach ensured that sponsors couldn't "hide" from deadlines, providing a more efficient and less confrontational way to manage deliverables.
In conclusion, the episode underscores the growing accessibility of building AI applications but strongly emphasizes the critical need for ongoing maintenance, skilled personnel, and robust oversight. The "set and forget" mentality is a fallacy, and organizations must be prepared for the continuous effort required to ensure these powerful tools function effectively and reliably. The "no lead left behind" principle, powered by AI agents, is presented as a key strategy for maximizing coverage and engagement across all stages of the customer journey.