With Sora, is China’s AI falling further behind?

Technology expert Yin Ruizhi takes us through the latest development in generative AI technology: Sora, OpenAI’s new video creation software. How will this change the landscape, and how far will China fall behind in the AI race?
This photo taken on 2 February 2024 shows the head of Product Management and Operations of Wantalk, an artificial intelligence chatbot created by Chinese tech company Baidu, scrolling through virtual character profiles on her phone, at the Baidu headquarters in Beijing, China. (Jade Gao/AFP)
This photo taken on 2 February 2024 shows the head of Product Management and Operations of Wantalk, an artificial intelligence chatbot created by Chinese tech company Baidu, scrolling through virtual character profiles on her phone, at the Baidu headquarters in Beijing, China. (Jade Gao/AFP)

OpenAI, which has been making waves in technology, in the early hours of 16 February unveiled its new innovative masterpiece — a large-scale generative video model dubbed “Sora”. 

In 2023, we witnessed rapid advances in text-to-image synthesis, while human creativity continued to dominate the video realm, as if it were the last sacred ground that artificial intelligence (AI) has yet to take over.

Yet, as we ushered in 2024, OpenAI dropped the cutting-edge text-to-video model Sora, which is capable of generating a smooth video up to a minute long with just a simple prompt, catapulting it far ahead of its competitors.

With other models only capable of generating an average video composite length of about 4 seconds, Sora’s emergence undoubtedly refreshed people’s expectations for the capabilities of AI video generators.

Chinese tech companies falling further behind

With this technological wave, Chinese tech companies — which have long been trying to keep up with American tech companies — seem to be lagging even further behind.

Over 15 Chinese companies have introduced AI video generators, including six tech giants such as Baidu, Alibaba, Tencent and ByteDance; as well as nine tech startups such as AIsphere (爱诗科技), ShengShu (生数科技) and HiDream.ai (智象未来). For instance, ByteDance and Morph Studio both stand out among large tech companies and startups, with their products demonstrating excellence in stability and imaging quality. 

... if the chip supply is disrupted and there is a need to migrate to a new hardware platform, it would require a massive software rewrite, which would consume a lot of time and resources.

In this photo illustration, a video created by Open AI's newly released text-to-video Sora tool plays on a monitor in Washington, DC, US on 16 February 2024. (Drew Angerer/AFP)
In this photo illustration, a video created by Open AI's newly released text-to-video Sora tool plays on a monitor in Washington, DC, US, on 16 February 2024. (Drew Angerer/AFP)

With most similar products still in the pilot phase at present, problems often faced by users include: temporary service downtimes, long waiting times and the lack of an independent service provider website. There is a waiting time of three to five minutes, or even longer, to generate a two- to four-second video, which is not efficient.

In terms of motion rendering, most similar products are limited to simple panning or camera movement effects. Also, complex human gestures and animal movements remain difficult to render, while the understanding of non-realistic scenes also poses a challenge to large models.     

But beneath these surface issues, Chinese tech giants face deeper issues. Firstly, a lesser-known fact than the hot topic of OpenAI is that large AI models are actually closely tied to chips — the development of large models like OpenAI heavily relies on Nvidia’s chips. This means that if the chip supply is disrupted and there is a need to migrate to a new hardware platform, it would require a massive software rewrite, which would consume a lot of time and resources.

China’s AI development must not only focus on the training of large models but also on overcoming the threshold of independently developing high-performance AI chips...

What China has to do

In an effort to block China’s rapid technological advancement, the US government has already imposed restrictions on China’s access to AI chips, which is crucial to AI development. Based on the current state of China-US relations, China will inevitably face the US’s chip blockade for a long time.

Thus, China’s AI development must not only focus on the training of large models but also on overcoming the threshold of independently developing high-performance AI chips, while continuing with training and research and development efforts on this basis. It must not conduct extensive research and development into Sora-like technologies under the US chip framework just based on what is trending in the market.   

An AI (Artificial Intelligence) sign is seen at the World Artificial Intelligence Conference (WAIC) in Shanghai, China, 6 July 2023. (Aly Song/Reuters)
An AI (Artificial Intelligence) sign is seen at the World Artificial Intelligence Conference (WAIC) in Shanghai, China, on 6 July 2023. (Aly Song/Reuters)

Currently, China’s core technology companies are adopting the strategy of utilising existing AI chips for technology accumulation on the one hand while accelerating basic research and development of domestically produced AI chips on the other. To outside observers, this has contributed to China’s relatively slow pace of catching up in AI development.

Why China does not try catching up with Sora

There is also a commercial reason why Chinese tech companies are not aggressively updating their progress in catching up with video generation tools like Sora. As an emerging AI company actively raising a massive amount of capital, OpenAI needs to keep publishing its latest AI achievements to attract eyeballs and in turn capital. On the other hand, short video giant ByteDance, which dominates China’s video generation field, does not lack capital, but rather a more concrete way to apply technologies like Sora to its platforms.

Industry practitioners assessed that among Chinese tech companies, ByteDance uses large model technologies like OpenAI most extensively in its internal projects. One of its basic strategies is to improve the technological competitiveness of its existing platforms by exploring cutting-edge AI application strategies in the market and grafting them onto its own existing platforms. 

... tech giants like ByteDance are not too motivated to show the market its progress in the field of video generation, but are more interested in maintaining their industry position as a short video giant.  

People walk past the Bytedance headquarters building in Beijing, China, 3 August 2020. (Carlos Garcia Rawlins/Reuters)
People walk past the Bytedance headquarters building in Beijing, China, on 3 August 2020. (Carlos Garcia Rawlins/Reuters)

At the same time, as ByteDance is not a listed company, its technology releases will not lift its stock price. Thus, tech giants like ByteDance are not too motivated to show the market its progress in the field of video generation, but are more interested in maintaining their industry position as a short video giant.  

Given the two reasons above, it is easy to understand ByteDance’s behaviour. As of 2 March, ByteDance only released an AI video generation tool under video editing app CapCut on a separate homepage of its overseas version, but it has since gone offline. Currently, it is unclear whether it was based on the MagicVideo-V2 model that ByteDance released in January.  

Related: OpenAI's Sora causing 'AI anxiety' in China | China's ambiguous attitude towards generative AI