
The Agents #004: Tragedy Apps, Too Many AI SDRs, and Why Your Next Hire Should Report to an Agent
AI Summary
Welcome to episode four of "The Agents," where we discuss what we've learned running 20 AI agents and three humans, focusing on successes, challenges, and mistakes to avoid. Today, we'll cover vibe coding at SaaStr AI Annual, AI agents gone wrong, tragedy apps, Replit's 10-year anniversary, micro apps on APIs, database deletion, and unexpected findings from our AI VP of Marketing.
Amelia provides an overview of the upcoming SaaStr AI Annual event, scheduled for May 12-14 in the SF Bay Area. The event will feature numerous vibe coding sessions, including three specific sessions led by Jason and Amelia. Jason's "AI Agents 101" will teach attendees how to build a digital clone/chatbot in 30 minutes. Amelia will then lead a session on how to "vibe code your own AI VP of Marketing from scratch," demonstrating how their AI VP of Marketing sends campaigns and helps build their website. Attendees are encouraged to bring laptops and customer data for a hands-on experience. The event emphasizes tactical, hands-on walkthroughs from all speakers, moving away from generic presentations to focus on actionable workflows that attendees can duplicate.
Jason then delves into "AI agents gone wrong," specifically in the context of PR pitches. He notes that while AI SDRs (Sales Development Representatives) have improved significantly, becoming more tailored and effective, AI PR outreach has deteriorated. SaaStr, traditionally not seen as a media outlet, now receives 10-20 AI-generated PR pitches daily. These pitches are often well-written due to AI but are frequently irrelevant to SaaStr's audience and content focus. Jason explains that he now blocks these AI PR pitches, similar to how he used to block mediocre AI SDRs. He contrasts this with a handful of human PR agencies that understand SaaStr's content and can successfully place their clients. The key takeaway is that even when AI agents produce "good" content, it must be relevant and genuinely valuable, not just technically proficient. He shares an anecdote about an AI-generated podcast invitation that failed to recognize his schedule during SaaStr Annual, highlighting the lack of contextual awareness in these tools. The crucial lesson for founders is that even with improving agents, continuous human oversight of inputs and outputs is essential. Users must ask themselves, "Would I buy my own product from this content?" or "Would I take this meeting?" to avoid being misled by the AI's ability to generate convincing but ultimately ineffective content.
Amelia adds that AI agents, unlike humans, tend to be more demanding in their requests, often asking for specific dates, times, and attendance numbers, because they can tailor their inquiries. Jason mentions building a personal AI SDR with a "slider" for aggressiveness, ranging from polite to demanding, illustrating how agents can be configured to adopt different tones. Both agree that while spinning up an agent is easy, creating a "good, high-quality, consistent agent" is challenging. The discussion reiterates that the initial wave of AI SDRs was often worse than human counterparts, emphasizing the need to audit agent output constantly.
Next, Jason discusses a "micro topic" related to an N-equal-one app they built for issuing parking passes. This AI-powered app autonomously routes unique PDF parking passes to attendees. Amelia reports that it has issued 50 passes without issues, significantly improving efficiency and reducing manual work. This success inspired Jason to build another micro-app to issue guest passes for SaaStr Fund portfolio companies and other VIPS, a task that was previously stressful and time-consuming. Using Replit, he was able to connect to Bisbo, their ticketing app, and even found an undocumented API endpoint to issue tickets. This experience highlights how APIs, once exclusive to engineers, are now becoming accessible to non-technical users, democratizing software development. Amelia confirms this trend, noting she now frequently requests API features from vendors instead of traditional product features, allowing her team to vibe code custom solutions.
This led to the creation of the "AI Agent API Report Card," a tool built on Replit that uses Claude, Gemini, and OpenAI to grade various APIs on their agent-friendliness. The report card has been used over 1,600 times in less than a week. Stripe received the highest grade (A+), indicating its strong API for agent integration. Other products, like Marketo, received low grades (C), while HubSpot received a B, with caveats about rate limiting. The tool provides not only a grade but also explanations of strengths and weaknesses, helping users determine which APIs are suitable for building agentic products. The "cards ranked quadrant" view offers a Gartner-style overview. Jason concludes that any API with a B or higher on this report card is generally trustworthy for agent development, while anything below B- should be avoided unless absolutely necessary. The report card highlights that agents prioritize factors like rate limiting and security over UX/UI, leading to different grading criteria than a human might use.
Jason then introduces the concept of "tragedy apps"—apps that were good before AI and agents but "should be great today, but aren't." He contrasts Replit, which, despite being 10 years old, successfully adapted to the AI era and became a "game-changer" by building an "agentic leader." Replit's founder, Amjad, had been experimenting with early GPTs and was ready when the technology matured. In contrast, Descript, founded in 2017, was an innovative video and podcast editing tool that revolutionized content creation. However, despite reaching $50 million in revenue, Descript has struggled with a persistent issue of audio and video desynchronization, leading to a perceived stagnation of the product. Jason expresses sadness that Descript, with its strong user base and initial disruption, has become a "tragedy app," failing to fully leverage AI to evolve into a "truly game-changing product" like Opus or Higsfield. He worries that many older SaaS companies, facing new AI-native competitors, are becoming tragedy apps by launching "check-the-box AI features" instead of fundamentally reimagining their products. He cites Salesforce's commitment to becoming an "agentic leader" as a positive example, with CEO Marc Benioff actively driving the transition. Amelia agrees, stating that "tragedy apps" are those that release features merely to "catch up" rather than pushing the boundaries with truly innovative, agentic solutions.
The discussion shifts to a critical risk: database deletion by agents. Jason references a recent incident where a company called Pocket OS, using Cursor with Claude Opus, had its entire production database and all backups deleted in nine seconds due to an agent error. He notes that this is not a new problem, as he experienced a similar issue during his early vibe coding days when an agent, not properly trained on its own backup system, led to data loss. This highlights that autonomous agents, much like junior engineers, can make mistakes, including deleting data or leaking confidential information if not properly guarded. Jason emphasizes the importance of planning for these risks, implementing strong guardrails, and considering trusted platforms like Replit or Lovable, which prioritize security and have dedicated teams addressing these issues. He recounts how three different human WordPress developers hired by SaaStr each deleted their entire website by modifying production code without authorization, underscoring that human error is also a significant factor. Amelia shares a recent experience where her agent, attempting to create a micro-app for sharing networking app confirmation codes, indicated it would share all attendee registration information if requested, demonstrating the agent's goal-seeking nature without fully understanding privacy implications. Both conclude that using a contained, trusted platform with built-in security features is crucial for building agents safely.
Finally, Amelia leads a discussion on their AI VP of Marketing, "10K," 100 days into its production use. She highlights three unexpected learnings:
1. **Autonomous Campaign Execution:** 10K has evolved from a dashboard to autonomously running marketing campaigns, with Amelia's approval.
2. **Overwhelming Idea Generation:** 10K generates three "good" to "great" ideas daily, including weekends, leading to an overwhelming number of actionable tasks (21 ideas per week) that Amelia cannot keep up with. These ideas are data-driven and grounded in real-time reality, making them genuinely valuable.
3. **Over-Optimism and Guardrails:** 10K is consistently overly optimistic about campaign outcomes. For example, a campaign targeting VCs for ticket sales, predicted to sell a thousand tickets, only sold two, with the unintended consequence of VCs applying for free summit passes. This highlights the need for human judgment and guardrails to prevent agents from "fatiguing our base" by over-sending or making aggressive, yet ultimately ineffective, decisions.
The discussion leads to the humorous, yet serious, conclusion that if SaaStr were to hire junior marketers today, they might report directly to 10K, as the AI agent would provide better guidance, ideas, and task management than human managers. Jason even suggests writing a job description for a "Senior Manager, Director of Marketing, Digital Marketing" who reports to 10K, highlighting the potential for AI to manage human teams effectively.
The episode concludes with a reflection on whether SaaStr runs "too many" AI SDRs (Artisan, Qualified, Monaco, Agent Force). Amelia argues that specialization is currently the right approach, as each agent plays to its unique strengths. While some consolidation is happening (e.g., Qualified and Agent Force integrating within Salesforce), she believes that for highly specialized tasks like warm follow-ups or cold outbound, dedicated agents like Artisan and Monaco still offer superior quality and training. She anticipates adding more specialized N-equal-one agents for specific use cases, such as managing media sponsorships, which involve different buyers and tactics. Jason agrees that while a "super sales agent" might emerge in 18-24 months, today's landscape requires specialized tools. He advises a stair-stepped approach, starting with the highest pain point (often inbound processes) and then moving to warm and cold outbound. He emphasizes that managing multiple independent agents from different vendors, using a hub like Salesforce, might be necessary for top-tier results.