How China’s open-source LLMs could empower Southeast Asia
China has led the way in providing “open weight” large language models (LLMs) such as DeepSeek-R1. This means that governments and companies in Southeast Asia can access them more easily and cheaply. But there are challenges in localisation and achieving data autonomy.
The emergence of the DeepSeek-R1 large language model (LLM) in January 2025 marked an important milestone from a technological, commercial, and political perspective. DeepSeek demonstrated that LLMs could be trained using 90% less computational power than more established platforms such as ChatGPT, making them cheaper to develop. The emergence of DeepSeek-R1 also underscores another thing: what Silicon Valley can do, Hangzhou, on the other side of the Pacific, can do as well, if not better.
While significant in light of Sino-US technology rivalry, DeepSeek’s moment also holds a number of important implications for Southeast Asia and other emerging regions. The emergence of low-cost LLMs increases access for businesses, governments and researchers, and provides an opportunity for Southeast Asian nations to assert their data autonomy.
These free alternatives pressure OpenAI to lower prices and reduce overall LLM service costs.
Open-source and open-weight
The first important point is that DeepSeek’s LLMs are largely based on open-source technology. This means that the source code, training data, and methodology are available for modification and improvement. Models like DeepSeek-R1 are made available on an “open-weight” basis. While this allows the model to be copied and adjusted, the methods used to train the models are not revealed.
Being open-source or open-weight impacts both the technological and commercial trajectory of LLMs.
Chinese technology companies such as Baidu, Alibaba, and Tencent have been active in developing open-source artificial intelligence (AI) models for many years. Their strategy, supported by Chinese universities and the government, can be seen as applying an open innovation model to accelerate technological development and leapfrog past the US.
However, Chinese companies are not the only ones investing in open-source AI. Meta and Google have also developed open-source LLMs as part of a commercial strategy aimed at lowering LLM development costs, attracting talent, and more effectively competing with proprietary LLMs.
A common competitive strategy for technology businesses is to try and “commoditise the complement” — making supplementary products or services cheap and readily available to boost demand for the core offering. For businesses using OpenAI’s closed ChatGPT, investing in open-source alternatives can be a smart move. These free alternatives pressure OpenAI to lower prices and reduce overall LLM service costs. Oracle used a similar strategy, supporting the open-source Linux OS to counter Microsoft’s proprietary Windows.
Locally developed AI models can boost productivity growth across the wider economy, while ensuring value is captured by local businesses instead of foreign firms.
Lower costs in Southeast Asia
The availability of high-quality, open-weight LLMs provides easier and cheaper access for governments and companies in Southeast Asia. This allows governments to operate their own LLMs, maintaining data autonomy and avoiding the transfer of sensitive information abroad, echoing Vietnam’s 2022 data localisation mandate for social media platforms.
The rapid fall in AI model training costs is necessary for data and AI localisation to be economically viable. As a point of comparison, Singapore reportedly spent US$52 million developing its SEA-LION LLMs, eight times more than DeepSeek claims to have spent.
Looking beyond the public sector, open-source LLMs also level the commercial playing field with start-ups in Southeast Asia now having access to the same core LLMs as start-ups in China and the US. Locally developed AI models can boost productivity growth across the wider economy, while ensuring value is captured by local businesses instead of foreign firms.
Western or Chinese LLMs may be insensitive to local social hierarchies, customs, and expressions.
Different perspectives
Yet, the emergence of Chinese AI has also highlights a different cultural problem. Chinese LLMs are known to be trained to repeat the Chinese Communist Party’s (CCP) version of history and its political views, thus conforming to the censorship system in China. Likewise, models trained primarily on English texts have a predominantly Western worldview.
Especially in Southeast Asia, with its large cultural, religious, and linguistic diversity, Western or Chinese LLMs may be insensitive to local social hierarchies, customs, and expressions. Poorly trained models based on unsuitable source material can pose significant societal risks. Just like how Facebook allegedly contributed to interethnic violence in Myanmar, new AI models could exacerbate existing social tensions.
Fortunately, LLMs can be retrained with relative ease. The training data for open-weight Chinese LLMs is more aligned with Chinese perspectives. But R1 1776, an open-source project based on the DeepSeek-R1 model, has shown that DeepSeek-R1 can be post-trained to remove these perceived biases.
Localisation important
For Southeast Asian countries, this highlights the importance of developing sufficient domestic capacity to localise and post-train LLMs for local conditions. Some of this capacity is already evident in the region, as demonstrated by Singapore’s SEA-LION LLMs.
Founded in 2015, Indonesia’s Kata.ai has developed leading natural language processing technology tailored to the Indonesian language, outperforming foreign competitors in this area. Vietnam’s VinAI, established in 2019 to develop manufacturing AI applications, recently sold its generative AI division to US semiconductor producer Qualcomm, highlighting the region’s world-leading AI R&D capabilities.
In short, the open-source turn in LLM development means that Southeast Asian countries now have an opportunity to exert much greater autonomy in using and applying such models.
First, countries should take advantage of the smaller size of new LLMs, which makes them much cheaper to deploy, use, and retrain locally without relying on foreign technology providers.
Second, countries should develop the capacity to retrain LLMs, making them more useful for local languages and more sensitive to local culture. Investments in LLM retraining could be seen as a public good and anchored at local universities, thus nurturing local talent and advancing R&D.
... China seems to have opened a door for Southeast Asia to catch up with the technology leaders.
Can Southeast Asia truly benefit?
Third, countries should host their own models to collect their own data. Due to the limited digital content generated in some regional languages, assembling and curating a sufficiently large corpus of text is an essential resource for improving local LLMs. Instead of large amounts of information used by foreign firms, such data should ideally be stored and used by local organisations.
With its current generation of open-source LLMs, China seems to have opened a door for Southeast Asia to catch up with the technology leaders. Yet, for Southeast Asia to walk through it requires not just an investment in local infrastructure and capabilities but also a clear assertion of data and AI autonomy.
Local universities and policymakers must ensure that they maintain a broad, society-centric view of AI, rather than a narrow commercial-technical perspective. This is essential to both achieve the full economic potential of AI and to ensure that the technology empowers, instead of harms, Southeast Asian societies.
This article was first published in Fulcrum, ISEAS – Yusof Ishak Institute’s blogsite.