Small vs Large AI Models & Open vs Closed Approaches

The AI world is having a debate about model size and openness. Should we build massive models or smaller, efficient ones? Should models be open source or proprietary? The answer isn’t black and white.

The Size Question

Clara Shih put it well: “The future of AI will be a combination of both large and small models because of climate impact, as well as for cost and performance reasons.”

Large models like GPT-4 or Claude are impressive. They can handle complex reasoning, write code, and tackle problems across domains. But they’re expensive to run and energy-intensive.

Small models are getting surprisingly good. Models like Llama 2 7B or Mistral 7B can handle many tasks that used to require much larger models. They run faster, cost less, and use less energy. And note, running faster also means lower latency - this is a critical point for many applications, for example in customer service.

The Training vs Inference Energy Problem

Arthur Mensch, CEO of Mistral AI, explains something important: “Most of the compute and energy resources required to run these systems are used at inference time rather than at training time. So you train for a couple of months, and when the models are deployed on many, many GPUs, then the large energy consumption is more linked to the usage than to the training itself.”

This changes how we think about efficiency. Everyone talks about the massive energy cost of training GPT-4 or Claude. But once trained, these models run millions of queries daily. The real energy consumption happens during inference, not training.

Mensch continues: “There are trade-offs between the amount you spend on training and the compression that you can achieve. If you invest more in training, you can make smaller models, achieving the same performance as a larger model with less compute. These smaller models consume less energy to deploy at inference time.”

So the strategy becomes: spend more upfront on training to get smaller, more efficient models that save energy long-term.

Open vs Closed Models

The openness question is separate from size, but equally important. Both approaches have clear advantages.

Open Model Advantages

Open source boosts innovation and drives collaboration. Open models also allow give more transparency, also regarding potential biases - less black box.

Open models let researchers study how AI actually works. They enable smaller companies to build AI products without training models from scratch. They democratize access to AI capabilities.

Closed Model Advantages

Closed models have business and security reasons:

Competitive advantage: Companies remain competitive by protecting their algorithms and unique methodologies. If you spend $100 million training a model, you probably don’t want to give it away for free.

Business sustainability: Closed models support subscription models, which fund continued R&D investments. OpenAI’s revenue from ChatGPT Plus helps fund GPT-5 development.

Security concerns: Open models can be misused by malicious actors. A terrorist organization can’t easily modify GPT-4 to help with dangerous activities, but they could potentially fine-tune an open model for harmful purposes.

What’s Actually Happening

In practice, we’re seeing both approaches work:

Large closed models like GPT-4, Claude, and Gemini are pushing the boundaries of what’s possible. They’re the research frontier.

Smaller open models like Llama 2, Mistral, and Code Llama are making AI accessible to everyone. Startups can build products on top of these without massive infrastructure costs.

Specialized small models are emerging for specific tasks. Instead of using GPT-4 for everything, companies are using smaller models fine-tuned for their specific use case.

To pick up the example I gave earlier, in customer services, particularly voice bots, latency is critical. It must feel “human-like”. Choosing a big model is not necessarily great therefore, although a big model will likely give you more accurate results. So this is trade off.

The Practical Reality

Most businesses will probably use a mix. You might use a large model for complex reasoning tasks and smaller models for simpler tasks like classification or extraction.

The energy efficiency argument favors smaller models for most use cases. The innovation argument favors keeping some models open. The business argument favors keeping some models closed.