
Prime is (mostly) right about AI
Audio Summary
AI Summary
The speaker discusses shifts in the AI economy, particularly focusing on the economics of LLM inference and compute availability, in response to a video by Primagen. While agreeing with Primagen's core observation that the AI economy is changing, the speaker aims to add nuance and correct perceived misunderstandings, especially concerning major tech companies like Microsoft and Google.
The speaker begins by highlighting Anthropic's "painted door test" (or "fake door test") as an attempt to gauge user willingness to pay more for Claude Code. Primagen interprets this as Anthropic trying to push users from a $20 tier to higher $100 or $200 tiers to recoup compute costs. However, the speaker argues this is less about profit and more about Anthropic needing to conserve compute for its lucrative enterprise customers. This is framed within a broader history of the AI economy's strains.
The speaker traces the breakdown of the initial AI economy model back to July of the previous year, citing Cursor as an early example. Cursor, which pays per inference to labs like OpenAI and Anthropic, initially priced its service by message count. This proved unsustainable because message costs varied drastically, from cents to dollars. Unlike larger companies, Cursor couldn't absorb these losses and shifted to pricing that reflected actual compute costs per request. GitHub also currently uses a message-based pricing model, which is scheduled to change.
Another significant shift highlighted is Anthropic's March announcement of adjusted session limits during peak hours. This followed an earlier experiment to encourage off-peak usage by offering double compute. When this failed to alleviate demand, Anthropic resorted to reducing peak hour usage, demonstrating their struggle to manage compute resources. The speaker reiterates that companies like Cursor, which do not own vast compute infrastructure, are more sensitive to these economic realities. Cursor's internal auditing of compute per dollar, for instance, reveals that a $200 Claude Code subscription can yield up to $5,000 in inference value due to Anthropic's aggressive subsidization.
The speaker emphasizes that Anthropic's primary revenue comes from enterprise clients, making subscription revenue a secondary concern, largely a marketing play. Their aggressive subsidization is a strategy to gain visibility and users, but it becomes problematic when compute capacity is limited. The core issue for Anthropic, and other AI companies, is not just making a profit on each API call, but recouping the massive investment in training models and securing sufficient compute. The speaker uses a personal anecdote about their electricity bill to illustrate the high cost of running GPUs, noting that even cheap electricity rates can lead to hundreds of dollars per month for a single high-end GPU running 24/7.
The discussion then delves into the economics of model training and inference. The speaker differentiates between pre-training (the expensive process of baking knowledge into a model) and post-training (refining its behavior). While pre-training is a massive, multi-billion dollar undertaking, post-training, particularly with reinforcement learning techniques, can be more cost-effective and yield significant improvements in performance, as seen with models like Cursor's Composer 2, which is a fine-tuned version of an older open-weight model. The speaker suggests that models like Opus 45 likely represented new, expensive pre-training, while subsequent iterations like Opus 46 and 47 were primarily post-training, thus less costly.
Regarding OpenAI, the speaker notes their significant investment and the substantial monthly burn rate, indicating a need to improve revenue generation. This leads to a discussion of Microsoft's GitHub Copilot. Primagen criticizes Copilot's message-based pricing, drawing an analogy to Walmart pricing by the number of items rather than their value. The speaker agrees that message-based pricing is flawed, especially when models have vastly different costs. For example, GPT-55 can be 7.5 times more costly per message than GPT-54.
However, the speaker refutes Primagen's assertion that Microsoft's pricing changes are solely about squeezing more money from users. Instead, the core problem for Microsoft, and indeed for Anthropic and Google, is a compute bottleneck. Microsoft's decision to pause Copilot signups is not a revenue-grabbing tactic but a necessity due to a lack of available compute capacity. This compute is crucial for their enterprise Azure clients. The speaker argues that the narrative should shift from "subsidy economy" to "compute restriction."
Comparing Microsoft and Anthropic, the speaker points out that Microsoft, unlike Anthropic, is a profitable company that can afford to subsidize for longer. However, even they are hitting compute limits. The speaker dismisses the idea that Anthropic's Claude Code tier changes are solely to upsell users; rather, it's to manage compute demand. Both the Claude Code tier adjustments and the Copilot signup pause are seen as direct responses to compute scarcity, not a desperate attempt to increase revenue from individual users.
The speaker then addresses Google, arguing that the perception of Google as a company that doesn't need to restrict services is a misinterpretation. Google is indeed providing immense free compute through features like AI Overviews in Search, even for signed-out users. Initially, Google's subsidization was so aggressive that they had to implement significant restrictions, even banning users who created plugins to track their usage. The speaker suggests that Google's own models are perceived as less capable, leading to less discussion about their subsidization efforts. Google's ability to make their own compute hardware (TPUs) doesn't exempt them from compute limitations; rumors suggest they even use CPUs due to shortages.
The speaker strongly refutes the notion that the cost of AI is not decreasing. Using data from the Artificial Intelligence Index, the speaker demonstrates that while frontier models like GPT-55 may have higher token costs, their increased efficiency and intelligence can lead to lower overall costs for completing tasks. GPT-55 Medium, for instance, performs comparably to GPT-54 X High but at a significantly lower cost. GPT-55 Low offers even greater cost savings. This efficiency gain is attributed to labs focusing on optimizing inference, leading to a rapid decrease in the cost of completing real-world tasks, particularly when using non-Anthropic models.
The speaker then returns to Microsoft's Copilot pricing, explaining the 7.5x multiplier for GPT-55 and 15x for Opus as not being directly tied to inference costs but to compute availability and demand. Microsoft is prioritizing serving enterprise clients who pay significantly more, and these multipliers are designed to deter individual users from consuming valuable compute resources. This is not about greed but about managing scarce compute for higher-paying customers.
Finally, the speaker expresses disagreement with Primagen's framing of Google's actions. The speaker argues that both Google and Microsoft's problems stem from compute limitations, not a lack of money to subsidize. Microsoft's caution is attributed to protecting its reputation, while Google's quicker, more aggressive changes are linked to its less concerned approach to public image, and the perceived inferiority of its own AI products and developer tools. The speaker concludes that while the subsidized compute era is indeed closing due to compute scarcity, the cost of AI for actual work is rapidly decreasing due to efficiency gains. The key takeaway is that the AI economy's evolution is driven by compute availability, not just profit margins for individual users.