GenAI Foundation Models: The LLM Race Has Only Just Begun

Raphaëlle d'Ornano
10 min readOct 1, 2024

--

Credit: Ben Wicks

The generative AI boom continues and shows no signs of slowing down. According to IDC, enterprise spending on generative AI will surge from $16 billion in 2023 to $143 billion by 2027.

So far, the most significant investments have been in the companies building the foundation models that enable the new technology. The sums have gotten so large and the usage so big that it seems on the surface that the LLM market is mature and dominated by just a handful of names: OpenAI, Anthropic, Meta, Google, Mistral, and Cohere. But not later than today Nvidia released a family of powerful open-source models able to compete with the leaders.

One year after writing about the market for GenAI foundation models, I wanted to revisit the topic and understand what has changed. Note that I am excluding from my analysis Nvidia’s new NVLM 1.0 family of large multimodal language models just released.

My key takeaways: Winners are emerging in the foundation model race, though innovation continues at full speed. Understanding those potential investment prospects and valuing them correctly requires rethinking the analysis of companies building foundation models.

To get an idea of just how dynamic the market for LLMs remains, just glimpse at Stanford’s Holistic Evaluation of Language Models (HELM) which is considered the gold standard for evaluating and ranking foundation models. The current leaderboard highlights the number of models vying for technical supremacy — and just how much more powerful subsequent versions have become. Two small models are in the Top 6, including Mistral’s 7B model, while Meta dominates the ranking with its 70B and 65B models.

Chatbot Arena, an open-source platform for evaluating AI created by researchers at UC Berkeley SkyLab and LMSYS, confirms this same evolution but with different results. Chatbot Arena evaluates human preference and shows that OpenAI dominates the race, but some lesser-known names are still in the top rankings. Though it has gotten less public attention, xAI’s Grok-2 (normal + mini) is in the top 10.

Among the LLM leaders, the market is evolving and diversifying at an accelerated pace. OpenAI caused a stir in September with the release of its o1-preview model, code-named “Strawberry.” Pitchbook noted that this represents a new category of LLMs with more powerful reasoning capabilities that simulates human thinking and a new avenue for applications and investment. As General Catalyst principal Chris Kauffman told Pitchbook: “There’s this whole new room for competition.” And

This LLM technical arms race doesn’t yet include models that have yet to appear but whose companies have already received huge amounts of funding. That includes Safe Superintelligence, the company founded by former OpenAI Chief Scientist Ilya Sutskever that just raised $1 billion to build AI systems that are both safe and more powerful.

Such rapid developments remind us that GenAI is still in its infancy as a technology. Some LLM companies have already closed their doors, and some have been acquired (Character.ai was absorbed by Amazon, and Inflection was absorbed by Microsoft), but new ones are emerging.

Amid this technical upheaval, investors have a few choices. They could sit on the sidelines and wait for safer, more predictable opportunities to emerge (if they ever do!). Or, they could treat LLMs like an index and just bet on all (or most) of them, hoping that a winner-take-all market eventually offsets the other losing bets. Fortunately, there is a third option. Much more financial data is now available, allowing investors to dig in to make informed decisions and cherry-pick the companies with the highest potential to be winners. This third option involves conducting thorough due diligence, understanding the unique fundamentals of each company, and making strategic investments based on a comprehensive analysis of the market and individual companies.

To understand how to frame that analysis, I want to focus on the names noted above that are at the forefront of this foundation model race (for now!): OpenAI, Anthropic, Meta, Google, Mistral, xAI, and Cohere. These are the heavyweights vying for dominance in the broader business application market, where deep-pocketed enterprises represent the most significant revenue opportunity. I’m excluding companies that are developing their models specifically for internal use.

The real potential for foundation models lies in business applications. While consumers may gravitate toward one dominant platform — OpenAI’s ChatGPT, for instance — the enterprise sector is a much more complex and significant battleground. Enterprises with larger budgets and complex needs are the driving force behind the evolution and diversification of this market, and understanding their role is crucial for investors.

The race to capture this enterprise market is crucial. Business applications are where foundation models can generate sustainable profits, especially since consumer-facing products tend to operate in a winner-takes-all landscape. Though less shiny, this is where profitability will be ultimately found.

I’ll use our Advanced Growth Intelligence (AGI) methodology to explain how investors can better understand the LLM market. The AGI methodology is a comprehensive approach to analyzing companies’ growth potential and resilience in the genAI sector.

Quality of Revenue

Foundation models operate on intricate revenue models, primarily driven by API access rather than subscriptions. Companies like OpenAI and Anthropic offer consumer subscriptions for chatbots (e.g., ChatGPT Plus or Claude Pro). Still, these represent only a fraction of the revenue potential, and there’s little room for differentiation.

The bulk of income comes from API access, both for access to the model and for its fine-tuning, where businesses pay for tokens based on model complexity. For example, OpenAI charges $0.006 per 1,000 tokens for GPT-4 (8k context), compared to $0.002 for GPT -3.5. This token-based pricing structure has significant implications for valuations because it introduces uncertainty in revenue projections. It also adds complexity to assessing annual recurring revenue (ARR), critical for valuation models: the quantum at stake is unknown. To address the difficulty of uncovering a trustworthy revenue source, the key is to build the right approach to determine API Access ARR (similar to transaction-based ARR). This allows for rationalizing foundation model valuations. (See our previous analysis of Mistral AI).

Even so, a word of caution: Applications are being built to optimize the level of token costs, which constitutes both a tailwind as foundation model usage increases with cost reductions and a headwind as revenue compresses due to lesser token requirements in applications.

Foundation models can enhance the quality of their revenue by building high-margin professional services fee — in addition to token revenue — on enterprise use cases. Though this is not “technological” per se, and thus drives a lower valuation than license revenue, it could become a critical revenue stream in the case token prices start to get comoditized.

Quality of Growth

The immediate goal for foundation models is to transform business applications from objects of curiosity into business-critical applications. The ability to catalyze business apps that become vital to customers will determine the winners of the foundation model race.

So far, enterprises have been experimenting with many models. In only a few cases has the technological choice of a foundation model been made definitive.

Several factors seem to drive decisions for businesses and developers:

  • Ease of use: This favors OpenAI for the moment. Per CB Insights, many businesses are sticking with OpenAI’s models — rather than switching to the ones developed by Google — because they are easier to use. Pricing does not appear to be the number one factor in decision-making. Iconiq’s recent State of AI report also found that: “Enterprises generally prefer to utilize proprietary models like GPT-4 over opensource models like Llama with on average ~60% of workloads being built with proprietary models.”
  • Sensitivity of data: sectors with sensitive data choose small models over LLMs as companies refuse to have their data hosted by the model providers;
  • Nature of use cases: smaller models allow for better performance across healthcare, law, and finance.
  • Professional services: The support customers get along their GenAI journey can sway many decision-makers as they develop and implement the solution.
  • Workflow capabilities: Anthropic’s latest version of Claude claimed technical leadership across several benchmarks.

Gross Retention Rate (“GRR”) is a critical indicator that confirms this choice as it measures a company’s ability to retain token revenue. However, assessing GRR is complex because revenue (as noted above) is token-based rather than subscription-based. While GRR analysis should be performed on overall revenue, investors can take a deep dive into both use cases and products to assess Quality of Growth at the cohort level.

Use cases first: What are the ones supported by a company’s different models, and what is the stickiness of the model? For example, in the case of customer chatbots, are customers sticking to GPT 3 — text-davinci-002 or going to other LLMs? This will provide better insight into where the spending is going and help establish the sustainability of the company’s growth.

As model costs increase and bottlenecks appear, comfort with growth prospects at iso-product is also necessary. Growth trends should be evaluated based on current models sold to enterprise customers, with improving Tech and performance on product after product creating new avenues of growth and not substituting existing ones.

That leads to the other critical factor for assessing Quality of Growth: cost of customer acquisition (CAC). Beyond the initial virality that tempts many to try foundation models, these companies eventually compete for enterprise IT budgets. As such, the ability to build an effective go-to-market strategy (at scale) is critical. However, assessing CAC becomes complex when foundation model companies need to target developers and customers, as in the case of open-source models like the Mistral AI example.

Quality of Margins

Foundation models are trying to optimize compute costs. OpenAI’s o1-mini is a good example. While some argue that LLMs are becoming commoditized, OpenAI’s o1 costs as much as 20x more per token than its mini model.

That will help offset margin pressures to a degree. However, foundation model companies still face a big challenge to stay ahead of potential commoditization by rolling out new and more powerful versions, which has led to rising costs of developing newer products due to computing costs.

Those computing costs have the potential to sink LLM companies. Google’s Gemini model cost $191m to train, and OpenAI’s GPT-4 cost $78m to train. Researchers are working full speed at ensuring that models are trained more optimally for computing power, but it will remain a struggle for now.

The most significant components of computing costs include GPU costs — either through direct hardware purchases or as part of computing services. GPU costs are central to the equation. We have moved away from the potential for GenAI to be a black swan event, but the viability of costs remains central. We had highlighted this as a key risk — in addition to commoditization — in our analysis of Mistral’s economics, but it applies to all LLMs.

When evaluating Margin quality for foundation models, investors need to recognize a fundamental accounting distinction that has major impacts on P&Ls: Opex vs. Capex.

Here’s why. GPU costs for developing and training can potentially be treated as both. The criteria may well depend on the actual use of the GPU. These companies are evolving far from public scrutiny, so there will be a great temptation for accounting manipulation. The accounting rules are complex to understand and need investors must understand how potential investment targets are applying these.

This gives an advantage to smaller models that work on smaller data sets. But the win is for applications in our view, undifferentiated at models layer as they pass on the costs to customers.

We should note other costs, such as data labeling and security and electricity and cooling. But the one that looms large is R&D.

Within R&D, investors should pay specific attention to alignment costs, a fundamental factor ensuring AI systems act in human interest.

Single points of failure

Foundation model development has a clear bottleneck: It needs staggering power and digital infrastructure to pursue its developments. The computing power required by transformer-based models requires far more energy than previous technological innovations.

Of course, this is a huge opportunity for private market investors. BlackRock CEO Larry Fink recently characterized this as a “multi-trillion long-term investment opportunity” when announcing a new $30 billion AI energy fund with Microsoft.

Despite the opportunity, foundation model growth is clearly limited in the short term. For example, Llama 4 is 100x more compute-intensive than Llama 3. This is part of the gap between AI revenues and the growing cost of infrastructure needed to support this growth, which Sequoia’s David Cahn dubbed “AI’s $600 billion question.”

Last, today’s LLMs have token limits. As some models become larger, many research firms are trying to go beyond the transformer paradigm and develop new variants.

A Marathon, Not a Sprint

It’s too early to declare an LLM winner as things stand today.

OpenAI has a clear lead, and Anthropic and Mistral appear to be serious challengers, thanks to their traction at the enterprise level. However, only the ability to sustain long-term growth and resilience will determine the winner.

Adoption, monetization, and margin optimization, along with the criteria I’ve outlined, will largely determine this. OpenAI’s leadership may be challenged if Mistral or Anthropic are able to navigate the complex landscape of business applications, manage costs, and drive sustainable growth. It could be challenged also by bold moves from existing or new actors in the LLM space.

In the meantime, the ability to attract funding is a strong determinant of success, given the slower adoption and monetization of business applications versus the cost of building LLMs. Investors have incredible sway in determining who will win — perhaps more so than we’ve ever seen in Tech.

The LLM competition has just started. Investors need the right tools to get into the game.

--

--

Raphaëlle d'Ornano

Managing Partner + Founder D’Ornano + Co. A pioneer in Hybrid Growth Diligence. Paris - NY. Young Leader French American Foundation 2022. Marathon runner.