Cost-efficienct AI: How Stanford built low-cost open source rival to OpenAI’s o1

AI keeps getting cheaper with every passing day!

Just a few weeks back we had the DeepSeek V3 model pushing NVIDIA’s stock into a downward spiral. Well, today we have this new cost effective model released. At this rate of innovation, I am thinking of selling off NVIDIA stocks lol.

Developed by researchers at Stanford and the University of Washington, their S1 AI model was trained for mere $50.

Yes – only $50.

This further challenges the dominance of multi-million-dollar models like OpenAI’s o1, DeepSeek’s R1, and others.

This breakthrough highlights how innovation in AI no longer requires massive budgets, potentially democratizing access to advanced reasoning capabilities.

Below, we explore s1’s development, advantages, and implications for the AI engineering industry.

Here’s the original paper for your reference – s1: Simple test-time scaling

How s1 was built: Breaking down the methodology

It is very interesting to learn how researchers across the world are optimizing with limited resources to bring down costs. And these efforts are working too.

I have tried to keep it simple and jargon-free to make it easy to understand, read on!

Knowledge distillation: The secret sauce

The s1 model uses a technique called knowledge distillation.

Here, a smaller AI model mimics the reasoning processes of a larger, more sophisticated one.

Researchers trained s1 using outputs from Google’s Gemini 2.0 Flash Thinking Experimental, a reasoning-focused model available via Google AI Studio. The team avoided resource-heavy techniques like reinforcement learning. They used supervised fine-tuning (SFT) on a dataset of just 1,000 curated questions. These questions were paired with Gemini’s answers and step-by-step reasoning.

What is supervised fine-tuning (SFT)?

Supervised Fine-Tuning (SFT) is a machine learning technique. It is used to adapt a pre-trained Large Language Model (LLM) to a specific task. For this process, it uses labeled data, where each data point is labeled with the correct output.

Adopting specificity in training has several benefits:

SFT can enhance a model’s performance on specific tasks
Improves data efficiency
Saves resources compared to training from scratch
Allows for customization
Improve a model’s ability to handle edge cases and control its behavior.

This approach allowed s1 to replicate Gemini’s problem-solving strategies at a fraction of the cost. For comparison, DeepSeek’s R1 model, designed to rival OpenAI’s o1, reportedly required expensive reinforcement learning pipelines.

Cost and compute efficiency

Training s1 took under 30 minutes using 16 NVIDIA H100 GPUs. This cost researchers roughly $20–$50 in cloud compute credits!

By contrast, OpenAI’s o1 and similar models demand thousands of dollars in compute resources. The base model for s1 was an off-the-shelf AI from Alibaba’s Qwen, freely available on GitHub.

Here are some major factors to consider that helped with achieving this cost efficiency:

Low-cost training: The s1 model achieved remarkable results with less than $50 in cloud computing credits! Niklas Muennighoff is a Stanford researcher involved in the project. He estimated that the required compute power could be easily rented for around $20. This showcases the project’s incredible affordability and accessibility.
Minimal Resources: The team used an off-the-shelf base model. They fine-tuned it through distillation. They extracted reasoning abilities from Google’s Gemini 2.0 Flash Thinking Experimental.

Small Dataset: The s1 model was trained using a small dataset of just 1,000 curated questions and answers. It included the reasoning behind each answer from Google’s Gemini 2.0.
Quick Training Time: The model was trained in less than 30 minutes using 16 Nvidia H100 GPUs.
Ablation Experiments: The low cost allowed researchers to run many ablation experiments. They made small variations in configuration to find out what works best. For example, they measured whether the model should use ‘Wait’ and not ‘Hmm’.
Accessibility: The development of s1 offers an alternative to high-cost AI models like OpenAI’s o1. This advancement brings the potential for powerful reasoning models to a broader audience. The code, data, and training are available on GitHub.

These factors challenge the notion that massive investment is always necessary for creating capable AI models. They democratize AI development, enabling smaller teams with limited resources to achieve significant results.

The ‘Wait’ Trick

A clever innovation in s1’s design involves adding the word “wait” during its reasoning process.

This simple prompt extension forces the model to pause and double-check its answers, improving accuracy without extra training.

The ‘Wait’ Trick is an example of how careful prompt engineering can significantly improve AI model performance. This improvement does not rely solely on increasing model size or training data.

Learn more about writing prompt – Why Structuring or Formatting Is Crucial In Prompt Engineering?

Advantages of s1 over industry leading AI models

Let’s understand why this development is important for the AI engineering industry:

1. Cost accessibility

OpenAI, Google, and Meta invest billions in AI infrastructure. However, s1 proves that high-performance reasoning models can be built with minimal resources.

For example:

OpenAI’s o1: Developed using proprietary methods and costly compute.
DeepSeek’s R1: Relied on large-scale reinforcement learning.
s1: Achieved comparable results for under $50 using distillation and SFT.

2. Open-source transparency

s1’s code, training data, and model weights are publicly available on GitHub, unlike closed-source models like o1 or Claude. This transparency fosters community collaboration and scope of audits.

3. Performance on benchmarks

In tests measuring mathematical problem-solving and coding tasks, s1 matched the performance of leading models like o1. It also neared the performance of R1. For example:

The s1 model outperformed OpenAI’s o1-preview by up to 27% on competition math questions from MATH and AIME24 datasets

GSM8K (math reasoning): s1 scored within 5% of o1.
HumanEval (coding): s1 achieved ~70% accuracy, comparable to R1.
A key feature of S1 is its use of test-time scaling, which improves its accuracy beyond initial capabilities. For example, it increased from 50% to 57% on AIME24 problems using this technique.

s1 doesn’t surpass GPT-4 or Claude-v1 in raw ability. These models excel in specialized domains like clinical oncology.

While distillation methods can replicate existing models, some experts note they might not lead to breakthrough advancements in AI performance

Still, its cost-to-performance ratio is unmatched!

s1 is challenging the status quo

What does the development of s1 mean for the world?

Commoditization of AI Models

s1’s success raises existential questions for AI giants.

If a small team can replicate cutting-edge reasoning for $50, what distinguishes a $100 million model? This threatens the “moat” of proprietary AI systems, pushing companies to innovate beyond distillation.

Legal and ethical concerns

OpenAI has earlier accused rivals like DeepSeek of improperly harvesting data via API calls. But, s1 sidesteps this issue by using Google’s Gemini 2.0 within its terms of service, which permits non-commercial research.

Shifting power dynamics

s1 exemplifies the “democratization of AI”, enabling startups and researchers to compete with tech giants. Projects like Meta’s LLaMA (which requires costly fine-tuning) now face pressure from cheaper, purpose-built alternatives.

The limitations of s1 model and future directions in AI engineering

Not all is best with s1 for now, and it is not right to expect so with limited resources. Here’s the s1 model limitations you must know before adopting:

Scope of Reasoning

s1 excels in tasks with clear step-by-step logic (e.g., math problems) but struggles with open-ended creativity or nuanced context. This mirrors limitations seen in models like LLaMA and PaLM 2.

Dependency on parent models

As a distilled model, s1’s capabilities are inherently bounded by Gemini 2.0’s knowledge. It cannot surpass the original model’s reasoning, unlike OpenAI’s o1, which was trained from scratch.

Scalability questions

While s1 demonstrates “test-time scaling” (extending its reasoning steps), true innovation—like GPT-4’s leap over GPT-3.5—still requires massive compute budgets.

What next from here?

The s1 experiment underscores two key trends:

Distillation is democratizing AI: Small teams can now replicate high-end capabilities!

The value shift: Future competition may center on data quality and unique architectures, not just compute scale.

Meta, Google, and Microsoft are investing over $100 billion in AI infrastructure. Open-source projects like s1 could force a rebalancing. This change would allow innovation to thrive at both the grassroots and corporate levels.

s1 isn’t a replacement for industry-leading models, but it’s a wake-up call.

By slashing costs and opening access, it challenges the AI ecosystem to prioritize efficiency and inclusivity.

Whether this leads to a wave of low-cost rivals or tighter restrictions from tech giants remains to be seen. One thing is clear: the era of “bigger is better” in AI is being redefined.

Have you tried the s1 model?

The world is moving fast with AI engineering advancements – and this is now a matter of days, not months.

I will keep covering the latest AI models for you all to try. One must learn the optimizations made to reduce costs or innovate. This is truly an interesting space which I am enjoying to write about.

If there is any issue, correction, or doubt, please comment. I would be happy to fix it or clear any doubt you have.

At Applied AI Tools, we want to make learning accessible. You can discover how to use the many available AI software for your personal and professional use. If you have any questions – email to content@merrative.com and we will cover them in our guides and blogs.

Learn more about AI concepts:

2 key insights on the future of software development – Transforming Software Design with AI Agents
Explore AI Agents – What is OpenAI o3-mini
Learn what is tree of thoughts prompting method
Make the mos of Google Gemini – 6 latest Generative AI tools by Google to improve workplace productivity
Learn what influencers and experts think about AI’s impact on future of work – 15+ Generative AI quotes on future of work, impact on jobs and workforce productivity

You can subscribe to our newsletter to get notified when we publish new guides!

Get in touch if you would like to create a content library like ours. We specialize in the niche of Applied AI, Technology, Machine Learning, or Data Science.

Applied AI Tools