When evaluating early-stage AI companies, most investors rely on the wrong signals. Web traffic looks impressive. User counts sound exciting. Growth charts trend upward. But none of these metrics answer the critical question: What are companies actually willing to pay for?
Andreessen Horowitz just released data that cuts through the noise. Working with Mercury, they analyzed over 200,000 startup transactions across three months to identify the top 50 AI companies that startups are genuinely spending money on—not just experimenting with, but writing checks for repeatedly.
This isn't survey data or usage statistics. It's behavioral evidence of what creates value in the AI economy. And for investors trying to separate signal from noise, the findings reveal patterns that traditional metrics completely miss.
Why Spending Data Matters More Than Ever
The venture industry has spent the past two years drowning in AI hype. Founders claim explosive ARR growth. Products tout millions of users. Pitch decks showcase viral adoption curves. But as we explored in our recent article on ARR manipulation, these metrics have become increasingly disconnected from reality.
Spending data is different. When a startup allocates budget to a tool month after month, they're making a real economic decision. They're betting that this product creates enough value to justify the cost. They're choosing this solution over alternatives—or over doing nothing at all.
This is exactly the kind of signal-based evidence that reveals sustainable business models versus temporary hype.
The Three Major Shifts Reshaping AI Applications
1. Category Boundaries Are Collapsing
Traditional software had clear vertical boundaries. Creative tools were for marketing teams. Code editors were for engineers. Financial analysis was for finance departments.
AI is demolishing these silos.
The a16z data shows creative tools as the largest single category, with ten companies making the top 50. But here's what matters: these aren't just being purchased by design departments anymore. Everyone is using Midjourney, ElevenLabs, and Canva. The same pattern holds for vibe coding platforms—Replit, Cursor, Lovable, and Emergent all made the list, but they're not just serving engineering teams.
What This Means for Investors:
When evaluating AI companies, traditional category definitions become misleading. A "developer tool" that's being adopted horizontally across organizations has fundamentally different growth dynamics than a traditional IDE. A "creative platform" used by finance teams to generate presentations operates in a different strategic space than Adobe targeting designers.
The companies that understand this shift are building horizontal platforms disguised as vertical tools. They're capturing budget from departments that never historically spent on their category.
2. The LLM Assistant Market Is Still Wide Open
Despite billions in funding and massive market attention, the general LLM assistant category shows no clear winner. OpenAI ranks #1, Anthropic #2, Perplexity #12, and multiple workspace-integrated solutions (Notion, Manus) also made the list.
Users are switching between different interfaces and models depending on their needs. This isn't typical winner-takes-all software dynamics—it's a market where companies are maintaining multiple subscriptions simultaneously.
What This Signals:
Either we haven't seen the true category winner emerge, or the LLM assistant space will remain fragmented with users maintaining multiple tools. For investors, this means:
- Market timing risk remains high for companies positioning as "the" AI assistant
- Differentiation must go beyond model access to justify separate subscriptions
- Integration capabilities may matter more than standalone excellence
- Switching costs remain low, creating ongoing retention challenges
The spending data suggests companies aren't yet locked into any single platform—a critical risk factor when evaluating valuations based on "inevitable" market dominance.
3. Consumer-to-Enterprise Is Accelerating Beyond Historical Patterns
Nearly 70% of companies on the spending list started as individual/consumer products and evolved to offer team or enterprise functionality. Twelve companies appear on both a16z's consumer traffic rankings and this enterprise spending list.
Several are still generating majority-consumer revenue even while commanding significant enterprise spend. Cluely (#26) and Midjourney (#28) both maintain consumer-first business models while capturing startup budgets.
Why This Pattern Matters:
Traditional enterprise software required 18-36 months to move upmarket. AI products are doing it in under 12 months. This creates:
- Compressed product-market fit timelines that make traditional SaaS milestones obsolete
- Blurred revenue composition where consumer and enterprise mix unpredictably
- Valuation complexity when comparing to pure B2B or B2C benchmarks
- New competitive threats from consumer products moving enterprise faster than enterprise incumbents can respond
For investors, this means re-evaluating what "enterprise-ready" looks like and understanding that consumer traction might be the fastest path to enterprise revenue.
The Replit vs. Lovable Case Study: What Revenue Really Measures
The most revealing insight in the entire report comes from comparing two vibe coding platforms: Replit and Lovable.
On consumer web traffic rankings, Lovable performs strongly, placing in the top quarter. Replit ranks much lower on pure traffic metrics. But when you examine actual startup spending? Replit generates approximately 15x more revenue than Lovable from Mercury customers.
This isn't a small discrepancy—it's a fundamental difference in value creation.
Why the Gap Exists:
Lovable excels at rapid UI and component generation with low barriers to entry. It's perfect for quick prototypes and consumer experimentation. But Replit offers something different: enterprise-grade functionality including autonomous agents that run for hours, built-in cloud services, databases, authentication, and secure publishing—all within the platform.
Lovable helps you start fast. Replit helps you build completely.
The Investor Lesson:
Web traffic tells you what people try. Revenue tells you what people value. Spending patterns tell you what companies believe creates lasting competitive advantage.
A founder experimenting with Lovable on a weekend project doesn't represent the same signal as a startup allocating monthly budget to Replit for production infrastructure. The first is exploration. The second is commitment.
This is precisely why DueCap's signal-based approach focuses on behavioral evidence over vanity metrics. Usage statistics are interesting. Payment behavior is predictive.
Vertical AI: Augmentation vs. Replacement
Of the 17 vertical AI applications on the list, only five position themselves as "AI employees" aiming to replace human roles entirely:
- Crosby Legal (agentic law firm)
- Cognition (AI engineer)
- 11x (automated GTM employees)
- Serval (AI IT service desk)
- Alma (AI-powered immigration law)
The remaining twelve focus on supercharging existing human employees—reducing repetitive tasks so people can focus on higher-value work.
This 30/70 Split Reveals Market Reality:
Despite all the rhetoric about AI replacing jobs, the spending patterns show companies are primarily investing in augmentation tools. This likely reflects several factors:
- Risk mitigation: Augmentation tools carry less operational risk than full replacement
- Incremental adoption: Companies prefer enhancing existing teams before restructuring entirely
- Quality control: Most domains still require human judgment for edge cases
- Organizational readiness: Few companies have adapted processes for AI-employee models
Investment Implications:
Augmentation tools may have broader near-term markets but potentially lower long-term pricing power. Replacement tools face higher adoption friction but could capture significantly more value if they work. The ratio of companies in each category provides real-time market feedback on what buyers actually believe is ready for production.
What Traditional Metrics Miss: The Meeting Support Category
Meeting support tools represent a fascinating microcosm of the AI market dynamics. Five companies made the list: Fyxer (#7), Happyscribe (#36), Plaude (#38), Otter AI (#41), and Read AI (#49).
These are primarily meeting notetakers—a relatively commoditized function. Yet multiple providers command meaningful startup spend. Why hasn't this market consolidated to a single winner?
Two theories:
Theory 1: Quality differentiation isn't clear. If all the tools work "well enough," switching costs are low and companies maintain multiple options or switch frequently based on specific features.
Theory 2: Integration matters more than core functionality. The winning product might not be a standalone tool but rather meeting intelligence built directly into calendar, CRM, or communication platforms.
Either way, the presence of five competitors in this narrow category suggests market maturity is earlier than many assume.
For Investors:
When multiple companies serve the same narrow function and all attract meaningful spend, you're likely looking at:
- Early-stage market definition where leaders haven't emerged
- Commoditized functionality where differentiation is weak
- Potential acquisition targets for platform players
- High risk for standalone venture returns unless a company can break out significantly
This is where signal-based due diligence becomes critical—understanding not just that a company has revenue, but why customers choose them and how defensible that choice is over time.
The Infrastructure Gap: What's Missing From the List
The a16z methodology specifically excluded cloud services, GPU providers, and infrastructure tools to focus on application-layer products. But this exclusion itself reveals something important.
Startups are spending dramatically on infrastructure—it just didn't make the application list. This creates a critical dynamic:
Application companies capture the relationship with end customers but infrastructure companies capture the majority of revenue.
For many AI applications, infrastructure costs (compute, models, storage) exceed their own profit margins. This means:
- Unit economics remain challenging for many application-layer companies
- Infrastructure providers have pricing power that application companies lack
- Gross margins are compressed compared to traditional SaaS benchmarks
- Scale may hurt profitability rather than improve it
When evaluating AI applications, understanding their infrastructure dependency is essential. A company with impressive revenue but 80% infrastructure costs has fundamentally different economics than one with 20% COGS.
What Spending Patterns Reveal About Product Stickiness
One underexplored aspect of the a16z data: these are companies that startups pay repeatedly over a three-month period. This isn't one-time purchase data—it's recurring spend.
That recurring pattern reveals several critical signals:
High Retention Indicators:
- Products solving ongoing problems, not one-time needs
- Tools integrated into regular workflows
- Solutions where switching costs have emerged
- Value creation that continues beyond initial deployment
Market Validation:
- Startups notoriously cut expenses quickly when tools don't deliver
- Maintaining spend through multiple months suggests real ROI
- Competitive budget allocation means these tools outperform alternatives
- Repeat payment indicates the product survives scrutiny from founders/CFOs
For investors, this spending persistence is arguably more valuable than total market size estimates or projected growth rates. It's actual behavioral evidence of value creation.
The Enterprise Control Paradox
Here's a subtle but important finding: many of the top-spending companies started consumer-first and are still generating majority-consumer revenue even while capturing enterprise budgets.
This creates a unique dynamic:
Startups are willing to use tools without traditional enterprise features (SSO, admin controls, compliance certifications) if the product is valuable enough. But this also means these companies are entering through unconventional channels—individual adoption rather than top-down sales.
What This Means:
- Bottom-up adoption is becoming default even for enterprise spending
- Traditional enterprise sales playbooks may be slower than product-led growth
- Security/compliance requirements are less blocking than historically assumed
- Individual value proposition must be strong enough to overcome organizational friction
For investors evaluating AI companies with consumer origins, the question isn't "when will they add enterprise features?" It's "is individual value strong enough that enterprises will adopt despite missing features?"
Implications for Due Diligence and Investment Strategy
The a16z spending report validates several core principles of signal-based investment analysis:
1. Behavioral Evidence Trumps Declared Intentions
Survey data about what companies "plan" to spend is far less valuable than actual spending patterns. Usage statistics about what people "try" matter less than payment data about what they commit to.
When conducting due diligence, prioritize:
- Actual payment retention rates over user counts
- Revenue per customer over total users
- Customer budget allocation over reported satisfaction scores
- Repeat purchase behavior over initial conversion rates
2. Category Leadership Metrics Are Context-Dependent
Being #1 in web traffic doesn't predict being #1 in revenue. The Replit vs. Lovable comparison proves this definitively. When evaluating competitive positioning:
- Understand what metrics actually correlate with revenue in each category
- Distinguish between experimentation metrics and commitment metrics
- Recognize that "market leader" depends entirely on how you define the market
- Question any analysis that relies on a single success metric
3. Market Maturity Indicators Require Multiple Signals
The presence of multiple meeting notetakers, multiple LLM assistants, and multiple creative tools—all commanding meaningful spend—suggests these markets are earlier stage than narrative suggests.
True market maturity shows up as:
- Consolidation to 1-2 dominant players
- Standardization of feature expectations
- Pricing compression and margin pressure
- Clear differentiation between premium and commodity offerings
When multiple players all attract significant spend without clear differentiation, the market is still forming.
4. Revenue Quality Matters as Much as Revenue Quantity
Not all revenue is equal. Understanding the composition of revenue—consumer vs. enterprise, recurring vs. one-time, retained vs. churned—provides essential context for valuation and risk assessment.
The companies generating enterprise spend from consumer-first products face different retention dynamics than pure B2B plays. Those with heavy infrastructure costs have different margin profiles than asset-light alternatives.
Due diligence must go beyond "how much revenue?" to "what kind of revenue, from whom, and how defensible?"
What's Missing: The Signals Spending Data Can't Capture
While spending patterns provide valuable behavioral evidence, they also have limitations:
Time Horizon Bias: Three months of spending shows adoption but not long-term retention. Some products might show strong initial spend but weak renewal rates.
Customer Quality Variation: Not all startup customers have equal value. A seed-stage company's $500/month spend means something different than a Series B company's $5,000/month spend.
Competitive Dynamics: Spending data shows current behavior but doesn't predict how competitive pressure might change pricing or adoption.
Product Evolution: Today's spending patterns reflect today's products. Rapid product development in AI means yesterday's winners might be tomorrow's also-rans.
Market Conditions: Startup spending during a funding boom looks different than spending during capital scarcity. These patterns reflect both product value and market conditions.
This is why comprehensive due diligence requires multiple signal types:
- Spending patterns (what this report provides)
- Founder execution capability (behavioral analysis)
- Strategic positioning (competitive dynamics)
- Operational foundation (scalability assessment)
- Team resilience (performance under pressure)
No single data source tells the complete story. But each provides valuable signal that, combined with others, builds a complete picture.
Looking Forward: What to Watch
Several questions emerge from this spending analysis that will shape the AI application landscape:
Will vibe coding consolidate or fragment? The presence of multiple successful players suggests the market might support specialized platforms for different use cases rather than one dominant winner.
How quickly will "AI employees" gain share? The 30/70 split between replacement and augmentation tools will shift as products improve and adoption accelerates. Watching this ratio change provides real-time market sentiment.
What happens to infrastructure costs? If compute costs don't decrease significantly, application-layer margins may remain compressed indefinitely—changing the entire valuation framework for AI companies.
Will consumer-first remain the fastest enterprise path? Or will we see purpose-built enterprise AI products start to outpace bottom-up adoption?
How do retention rates evolve? The critical question isn't just what startups spend today, but what they'll spend in 12-24 months as products mature and alternatives emerge.
The Bottom Line: Signal Clarity in a Noisy Market
The AI market is awash in misleading metrics. ARR gets manipulated. Web traffic measures curiosity, not commitment. User counts confuse free experimentation with genuine value creation.
Spending data cuts through this noise. When startups repeatedly allocate precious budget to specific tools, they're revealing what actually creates value. Not what sounds impressive in pitch decks, but what delivers measurable returns.
For investors, this means shifting focus from what companies claim to what customers do. From growth projections to actual behavior. From category narratives to spending evidence.
At DueCap, this is exactly the kind of signal-based analysis that informs our due diligence and oversight work. We look for behavioral evidence that predicts outcomes, not vanity metrics that predict nothing.
The a16z spending report provides a valuable snapshot of current AI application reality. But it's just one signal among many. The investors who combine spending patterns with founder behavior analysis, strategic positioning assessment, and operational capability evaluation will consistently outperform those relying on any single metric—no matter how compelling.
Because in the end, the companies that win aren't just the ones that raise the most money or generate the most hype. They're the ones that solve real problems well enough that customers keep paying for them, month after month, even when cheaper alternatives emerge.
That's the signal that matters most.
Want to understand the signals that actually predict AI company success? DueCap provides signal-based due diligence and oversight that goes beyond traditional metrics to reveal what really drives value creation. Learn more at duecap.com.