• Home
  • AI Transformation
  • Product Development
  • Case Studies
  • Blog
  • Careers
  • Build Team with Scrum AI

DeepSeek: A Groundbreaking Leap in AI Development or China's Dark Horse?

While AI skeptics continue to speculate that AI will never become truly conscious and can merely follow patterns set by human creators, a Chinese startup DeepSeek has appeared out of nowhere and taken the tech world by storm with its new revolutionary AI model. Their DeepSeek-R1 not just rivals leading AI heavyweights from US companies such as Google and OpenAI but even surpasses them.

DeepSeek first made headlines only a week ago, and within that short time, the DeepSeek app has overtaken ChatGPT in downloads across App Stores in Australia, Canada, China, Singapore, US and UK. As the app is completely free for now, it will continue to attract more and more users. It has also quickly climbed to the top of downloads on the Hugging Face developer platform as they rush to test it and understand what this release can give to their AI projects. Now, it seems that everyone is talking about DeepSeek. 

So where did DeepSeek come from, who is behind this startup, what makes it special, and why has it shaken the global tech stock markets? We will try to answer these and some other questions in this blog.

Where did DeepSeek come from?

The story begins not in Silicon Valley, but in China, in the world of finance. Chinese hedge fund High-Flyer, founded in 2015 by entrepreneur Liang Wenfeng, made its fortune by using AI and algorithms to identify patterns that could affect stock prices. Liang bought Nvidia GPUs before the US banned their export to China and used them to analyze financial data. Experts believe this was a key factor in the launch of DeepSeek, as the company used these chips in combination with cheaper and lower-quality ones that could still be imported.

In the wake of the rise of ChatGPT, a chatbot from the US company OpenAI, Liang, who also has a master's degree in computer science, decided to occupy these capacities, that most of the time were idle, with something interesting. He invested his fund's resources in a new company called DeepSeek, which he established in December 2023. He aimed to build his own advanced models with the ultimate goal of creating AGI (Artificial General Intelligence).

As Liang later told the Chinese press, he founded DeepSeek driven by scientific curiosity, not profit. He made DeepSeek a strictly local company with no international hires, which aligns with Chinese government policy, contributing to the startup's popularity in China. He built a team of recent PhDs from top Chinese universities rather than experienced engineers, believing that students may be better suited for high-investment, low-return research. So, he focused on young talent who were eager to prove themselves. According to the rumors, they often worked 18-hour shifts. Funding the venture was equally bold, relying solely on profits from High-Flyer to pay top AI talent salaries. Overall, this strategy reveals a deep belief in the transformative potential of AI and a long-term, research-first approach rather than commercial gain. 

And finally, on January 10, 2025 DeepSeek released an AI assistant based on the DeepSeek-V3 model, a traditional transformer-based language model (like GPT-4) and on January 20, 2025 it released DeepSeek-R1 model, more efficient scalable AI model using MoE architecture that rivals leading AI heavyweights from US companies such as Google and OpenAI or even superpass them. So, how did they manage to do this? 

Why DeepSeek-R1 AI model is So Effective? 

One of the most surprising innovations DeepSeek presents is a deliberate departure from the traditional supervised fine-tuning (SFT) process widely used to train LLMs. SFT is a standard approach for AI development and involves training models on prepared datasets to teach them step-by-step reasoning, often called a chain of thought (CoT). However, DeepSeek has challenged this assumption. Instead of relying on massive datasets with a teacher, they skipped SFT and relied on reinforcement learning (RL) to train DeepSeek-R1b, and, as a result, achieved remarkable breakthroughs using pure RL. 

Their model autonomously developed complex reasoning, generated thought chains, self-checked its work, and allocated more computing time for complex problems. Researchers witnessed a so-called "aha" moment when the model spontaneously revised its reasoning mid-process upon facing uncertainty, flagging issues, and restarting with a new approach. All this suggests that CoT-like reasoning emerged naturally through RL rather than being explicitly programmed or fine-tuned. DeepSeek also tackled the issue of speech coherence in reasoning models. By rewarding coherent outputs during RL, it avoided language mixing and incoherence, achieving more readable results with only a minor performance trade-off. 

As a result, DeepSeek-R1 achieves high accuracy and efficiency in performance benchmarks, outperforming many open-source models and achieving results comparable to leading closed-source models like GPT-4:

  • Reasoning Tasks: At AIME 2024, one of the most challenging math competitions for high school students, R1 achieved 79.8% accuracy, slightly surpassing OpenAI-o1-1217. 

  • Coding: Achieves a 96.3% on Codeforces (competitive coding), almost matching OpenAI-o1-1217.

  • Math: Scores 97.3% on MATH-500.

  • General Knowledge: Achieves 90.8% on MMLU but slightly underperforms OpenAI-o1-1217.

The most impressive is perhaps how DeepSeek has distilled these capabilities into much smaller models. DeepSeek-R1's reasoning abilities have been transferred to models ranging from 1.5B to 70B parameters. Their 14B distilled model even outperforms the much larger Qwen 32B model (QwQ-32B-Preview), suggesting that reasoning ability depends not only on the number of parameters but also on how you train the model to process information. Their 32B and 70B distilled models set new records for reasoning tasks in their respective size categories. 

On top of all of this, DeepSeek shocked everyone with API prices. OpenAI o1-2024-12-17 API costs $15/1M input tokens ($7.50 /1M cached input tokens) and $60/1M output tokens. DeepSeek Reasoner API, based on the R1 model costs $0.55/1Minput tokens ($0,14 /1M cached input tokens) and $2.19/1M output tokens ($0,14 /1M cached input tokens). While achieving similar levels of accuracy to OpenAI o1, DeepSeek is significantly cheaper for developers. Besides, DeepSeek made the API identical to what OpenAI has. All the methods, parameters, and responses are the same. Which means that it is possible just to change the URL on the server to integrate it into the product. 

And while users quickly tested DeepSeek with politically sensitive questions about Tiananmen Square and Taiwan and revealed censorship, which is expected for a Chinese AI lab, it's unlikely to deter most developers and users eager for cutting-edge, open-source AI. 

And while AI heavyweights from US companies such as Google and OpenAI were contemplating moral implications, biased results, MIT licensing, and the possibility of securing IP rights for their AI models, DeepSeek made a bold move by releasing its R1 model as completely open-source, with extremely detailed technical reports explaining how the models work and code that anyone can look at and try to copy. It is a blow to the reputation of OpenAI, which is incredibly secretive about how these models actually work at a low level. 

This DeepSeek’s particular focus on research makes it a dangerous competitor to OpenAI, Meta, and Google, as, at least for now, DeepSeekis willing to share its discoveries rather than protect them for commercial gain. With that said, OpenAI might believe that's not fair but can do nothing about it and, for now, announcing that the o3 mini will be included in the free-tier subscription with plans to decrease the cost of its Plus Subscription from $20 to $10/month to stand the competition.

How much did DeepSeek-R1 cost to develop?

According to a DeepSeek-V3 Tech report, DeepSeek trained its base model, V3, with a budget of $5.58 million over two months. The company stated that DeepSeek-V3 required 2.788M H800 GPU hours for its full training, which suggests they used around 2,000 GPUs over that period. This came as another major shock to the market, as it represents a tiny fraction of the computation power and expenditures that U.S. AI companies typically spend on their AI models—often reaching hundreds of millions or even billions of dollars.

As for the cost of developing and training DeepSeek-R1, it remains uncertain. Billionaire and CEO of Scale AI Alexander Wang recently told CNBC that he estimates DeepSeek currently has about 50,000 NVIDIA H100 chips,  though the company does not publicly disclose this, likely due to U.S. export restrictions on advanced chips. Estimating the total cost of training DeepSeek-R1 is difficult as using 50,000 NVIDIA GPUs could potentially cost hundreds of millions of dollars, so the exact figures remain speculative. Still, if this estimate of GPUs is accurate, DeepSeek’s computing power is still significantly smaller than AI leaders like OpenAI, Google, and Anthropic, each of which has over 500,000 GPUs. 

Besides, while DeepSeek has not raised any outside funding and has not yet taken significant steps to monetize its models, it is not known for certain whether the Chinese government is involved in financing the company. It is far from impossible that the Chinese government indirectly helped the company, but there is no public evidence confirming it so far.

Why DeepSeek-R1 shocked Silicon Valley?

DeepSeek is in many ways breaking the business model of OpenAI and other Western companies that are working on their own closed AI models. They have developed an AI language model comparable to OpenAI's ChatGPT but at a fraction of the cost and without relying on advanced chips from companies like Nvidia. This announcement by a small startup about creating a powerful AI model at minimal cost has made experts wonder whether it is really necessary to invest tens of billions of dollars in creating large clusters for training neural networks. 

But what’s more important, they managed to train DeepSeek-V3 with a surprisingly modest hardware footprint. The efficiency demonstrated by DeepSeek’s training process challenges established beliefs within the industry about the true necessity of expansive hardware in AI development and widespread belief that the future of AI will require ever-increasing amounts of power and energy to develop. 

The market still doesn’t know for sure whether AI development will actually require less computing power in the future, but with investors rethinking its impact on US competitors and their suppliers, it is already reacting nervously, with shares of NVIDIA and other suppliers of components for AI data centers falling.  Nvidia, the US giant behind the high-tech chips that dominate many AI investments, had been seeing its share price surge in the last two years due to growing demand, and experienced the hardest hit due to the DeepSeek announcement. Its share price dropped by roughly 17% on Monday, wiping roughly $600bn off its market value. However, the stock rebounded by approximately 8.8% on Tuesday, suggesting that the initial sell-off may have been an overreaction. 

The stock drop also affected major companies like Meta, Alphabet (Google), Nvidia competitors (Marvell, Broadcom, Micron, TSMC), and data center firms (Oracle, Vertiv, Constellation, NuScale). The same thing happened in Europe. ASML, a Dutch maker of chip manufacturing equipment, fell more than 10% in shares, while Siemens Energy, which makes hardware related to AI, fell 21%. Among the first to react to new AI were also Japanese chipmakers that work closely with American developers, such as Advantest, Tokyo Electron, Softbank Group, Furukawa and Fujikura.  

The assumption that AI development may require less computing power also calls into question the feasibility of the Stargate project, an initiative in which OpenAI, Oracle and SoftBank promise to build next-generation AI data centers in the US, on which they are allegedly ready to spend up to $500 billion.

Is DeepSeek China's Dark Horse?

Likely not. While DeepSeek’s innovations are groundbreaking, they have by no means given the Chinese AI lab market leadership. As DeepSeek has published its research, other AI model companies will learn from it and adapt. Meta and Mistral, a French open-source model company, may be a little behind, but it will likely take them only a few months to catch up with DeepSeek. As Meta’s lead AI researcher Yann LeCun said, “The idea is that everyone profits from everyone else's ideas. No one "outpaces" anyone and no country "loses" to another. No one has a monopoly on good ideas.” 

One more aspect we need to consider when dealing with Chinese companies is the security of the data. For many it remains in question, since DeepSeek-R1 probably stores it on Chinese servers. But as a precautionary measure, it is possible to try the model on Hugging Face in sandbox mode, or even run it locally on the PC with the necessary hardware. In such cases, the model will not be fully functional, but this will remove the issue of transferring data to Chinese servers.

What’s Further Raising Concerns in the US?

DeepSeek-R1 has shocked the market and for more than one reason. It is not only its novel approach to training the model, but also the fact that it is the first time a Chinese AI model has gained such widespread popularity in the West, undermining the belief that the US is the undisputed leader in artificial intelligence. 

DeepSeek's success has also called into question the effectiveness of US export restrictions on advanced chips and AI technologies to China aimed at slowing China’s AI progress in the tech race. DeepSeek’s progress suggests that Chinese AI engineers either overcame US export controls or have worked their way around the restrictions on increasing computing power and developed a solution with improved algorithms that could work great even with fewer or old weak microchips. While it remains unconfirmed how much advanced AI-training hardware DeepSeek has had access to, the company has demonstrated enough to suggest the trade restrictions have not been entirely effective in preventing China’s progress.

Final Thoughts

DeepSeek marks a significant milestone in terms of AI potential to "think" and challenging the idea that only the biggest budgets win in AI. Their open-source AI models will likely continue to drive down the cost on the market, benefiting consumers, startups, and businesses interested in AI. If DeepSeek-R1 achieved its performance with fewer resources, it could reshape industry economics and challenge firms like OpenAI and Google, which have invested billions in infrastructure. On the other hand, DeepSeek’s ability to scale remains uncertain due to US export restrictions on advanced AI chips.While constraints may have pushed it to innovate, a lack of computing power could prevent expansion, giving US competitors time to catch up. Moreover, AI leadership depends not just on hardware but also on talent, software, and ecosystem advantages, areas where American firms still hold a strong lead. Although DeepSeek is a major step forward, only time will tell whether Chinese AI companies can seriously compete with Western tech giants. 

article-author-img

Maryna Kharchenko

01/29/2025

AI
article-recomended-hero-[object Object]
7 Best AI Logo Generators in 2024

Traditionally, crafting a distinctive logo required navigating complex design software or investing in a skilled designer. But here's the good news: with the advent of artificial intelligence, a solution to this challenge has arrived.

Read more
article-recomended-hero-[object Object]
Top 5 Free Automation Testing Tools

Automation testing is a vital component in modern software development, significantly improving testing quality and efficiency. It's applicable to various software types, including web and mobile apps, desktop applications, and APIs.

Read more
article-recomended-hero-[object Object]
Top Tech Conferences in 2024

The world of technology is constantly evolving, and staying up-to-date with the latest trends, innovations, and breakthroughs is crucial for professionals in the field. Attending technology conferences in 2024 can provide invaluable opportunities for networking, learning, and staying ahead in the fast-paced tech world.

Read more