OpenAI's "Orion" Model Advances Challenge AI Scaling Laws

Author's Avatar
Nov 11, 2024

The next flagship model from OpenAI, named "Orion," is showing slower progress compared to its predecessors, challenging the long-held "scaling laws" in AI. This observation has prompted a shift in focus towards post-training improvements of models in the industry.

As AI products like ChatGPT continue to gain users, the development speed of the underlying technology, large language models (LLMs), appears to be decelerating. OpenAI's "Orion" model has completed 20% of its training and is performing similarly to the current GPT-4, but its progress doesn't match the leap seen in the previous generations.

Reports indicate that while Orion shows enhanced capabilities in language tasks, it might not outperform previous models in areas like coding. The operational costs for Orion at OpenAI's data centers may also be higher compared to recent models.

Historically, LLMs have used publicly available texts from websites and books for pre-training, which has led to limited quality improvements despite alleviating data scarcity. Orion's training incorporates AI-generated data from models like GPT-4, presenting a challenge as it might resemble older models in some aspects.

Other AI companies face similar hurdles. Both Meta's founder, Mark Zuckerberg, and Databricks' Ion Stoica have noted that while advancements continue in areas like coding and complex problem-solving, improvements on commonsense reasoning and general task capabilities have slowed.

The slowdown of Orion challenges the traditional AI scaling laws, which predict substantial performance improvements with increased data and computational resources. The industry is now exploring post-initial training improvements to potentially create new scaling paradigms.

The reduction in high-quality training data and rising computational costs have pushed OpenAI researchers to explore alternative methods to enhance model performance. Efforts include embedding more code-writing features and developing software that automates tasks like web browsing.

OpenAI has formed a team led by Nick Ryder, previously responsible for pre-training, to optimize training data and refine scaling methods. This team is working on improving the model's problem-solving capabilities through reinforcement and human evaluation feedback.

The financial burden of high computational costs is significant, with inference costs for the o1 model being six times that of standard models. OpenAI and others still invest heavily in data centers to leverage increased computing power for potential gains from pre-trained models.

At the TEDAI conference, OpenAI researcher Noam Brown warned about the financial implications of developing more advanced models, questioning the feasibility of spending billions on training.

In the future, AI companies like OpenAI must balance the trade-offs between training data and computational resources to optimize model performance without incurring prohibitive financial costs.

Disclosures

I/We may personally own shares in some of the companies mentioned above. However, those positions are not material to either the company or to my/our portfolios.