Thai government to launch local-language LLM this September
By Si Ying Thian
ThaiLLM will make it easier for government agencies to build local chatbots to enhance the delivery of public services.
Thailand's Big Data Institute (BDI), a public organisation, as well as its academia and industry partners, are building a LLM for the Thai knowledge known as the ThaiLLM. Image: Canva
Last year, researchers at Georgia Tech found that more than half of the health queries asked through English-centric chatbots by non-English speakers were irrelevant or contradictory.
The queries were tested on widely used chatbots like ChatGPT on Spanish, Chinese and Hindi languages, which are world’s most spoken languages after English.
If used in the healthcare sector in a non-English speaking country, this could lead to misdiagnoses and poor care. The findings highlighted an urgent need for large language models (LLMs) to be trained in and optimised for local languages and cultural contexts.
This is why Thailand’s Big Data Institute (BDI) and its partners are building a LLM for the Thai language called the ThaiLLM.
“As a government agency, our job is to provide a basic infrastructure for the rest of society and economy to build their AI initiatives on top of it,” says BDI’s Director, Dr Tiranee Achalakul.
Speaking to GovInsider, Dr Achalakul shares that the first foundation model is expected to launch in September 2025.
Aside from the model, BDI and its partners will also release the corpus, the open-source license, and toolkits to build LLM pipelines.
ThaiLLM has been in the works since last November when BDI released a data repository for the public to deposit and share their data sets that are in the Thai language, she explains.
ThaiLLM’s impact
Dr Achalakul shares that ThaiLLM will make it easier for other government agencies to build chatbots to improve and deliver better services for citizens.
It will also enable Thai startups and small medium-sized enterprises (SMEs) to leverage AI to better serve local consumers.
According to Bangkok Post, the initial phase of this release will also focus on building specialised foundation models tailored to healthcare and environmental sectors.
As most chatbot development focuses on fine-tuning LLMs for specific uses, she highlights the government’s key role in developing a Thai foundation language model to support this fine-tuning process for different sectors.
Currently, there are already several Thai-language models launched by private companies. Hence, BDI and its partners are integrating private and public sector data into a corpus, which refers to a collection of data to train the LLM.
“We are not starting from scratch in terms of gathering the text [to train ThaiLLM] as the private sector in Thailand is already doing it.
“For the model itself, there are many open-source models out there. So, we just need to have the budget to buy and rent enough GPU power to train this model,” she explains.
To subscribe to the GovInsider bulletin click here.
Multistakeholder collaboration the ‘right way for Thailand’
ThaiLLM is a multistakeholder collaboration as it is necessary to sustain an AI community even when the government steps out of the picture, says Dr Achalakul.
There are currently seven teams involved in creating ThaiLLM, including two government agencies which are BDI and National Electronics and Computer Technology Center (NECTEC), three universities which are Vidyasirimedhi Institute of Science and Technology (VISTEC), Mahidol University and Chulalongkorn University, as well as the remaining two AI-related industry associations.
BDI has been tasked to integrate industry, academia and government data into a corpus, which refers to a collection of data used to train the LLM.
Discussions are underway to create a consortium that will be managed by a private sector partner.
To sustain an open-source model, the consortium will be able to provide engineering expertise to maintain the model and ensure the longevity of related programmes, she adds.
“Making these collaborations solid and sustained is one thing we are trying hard to design and it’s the right way to do that in Thailand,” she explains.
Moving forward, BDI also has plans to collaborate with ASEAN’s first LLM developed by AI Singapore.
Thai language newspaper, MGR Online, reported last September that the budget for ThaiLLM was cut by 20 per cent to approximately 90 million baht (S$3.6 million), and the work plan was adjusted to focusing on hiring outside to process the data needed to train the model.
National big data platform
BDI is also tackling the problem of siloed data in the government through the national big data platform that is set to launch this year as well.
The platform will leverage four existing sector specific platforms, namely the Health Link, Environment Link (Envi Link), Travel Link and the Smart Data Analytics platforms.
“[The platform] is decentralised, meaning that we don’t put all the data into one bucket. We just build links to connect them. The platform is more for analytics,” she explains.
As there is not one single law for the government for data sharing across agencies, she shares that
BDI is currently working with the Ministry of Digital Economy and Society (DES) to draft a Data Sharing Act to smoothen this process.
Citizen data privacy and security concerns would be important to consider in the platform, compared to ThaiLLM which focuses on training with Thai text, she notes.
At the recent ASEAN Digital Ministers' Meeting in mid-January, the Expanded ASEAN Guide on AI Governance and Ethics - Generative AI was released.
The expanded guide recommends a range of policy actions that ASEAN states can take in fostering responsible AI adoption. ThaiLLM is among the few use cases that have taken steps to implement practices aligned with AI governance and ethics.