Singapore to channel US$52 million into building capacities for SEA’s first regional LLM
By Si Ying ThianYogesh Hirdaramani
The new National Multimodal LLM Programme will accelerate efforts to develop Southeast Asia’s first regional LLM, and is targeted at building skilled AI talent and fostering industry collaborations.
Singapore's Deputy Prime Minister and Minister for Finance Lawrence Wong at the Singapore Conference on AI for the Global Good on 4 December 2023. Image: Smart Nation Singapore.
When Rest of World reporters tested ChatGPT’s ability to respond to prompts in Kurdish, Bengali, and Tamil, they found that the popular AI chatbot running on GPT-3.5, a large language model (LLM) developed by OpenAI, frequently made grammatical errors, garbled phrases and even made words up.
This was because such languages are less represented online, and widely available models may not have enough training data from the Internet to produce good responses in these languages.
As such, this is one reason why Singapore’s Infocomm Media Development Authority (IMDA), alongside other research institutions such as AI Singapore (AISG) and the Agency for Science, Technology and Research (A*STAR), announced the launch of the National Multimodal LLM Programme (NMLP) on 4 December 2023.
The initiative aims to develop a base model that accounts for the region’s unique multilingual environment, the context and values related to the region’s languages, and context-switching between languages.
The NMLP is a two-year, S$70 million (US$52 million) initiative funded by the National Research Foundation, and aims to support national-level strategies in AI and R&D.
Something’s already been brewing…
The initiative builds upon existing efforts by AISG, which recently launched the SEA-LION (Southeast Asian Languages in One Network) model, an open-sourced LLM with a focus on Southeast Asian languages.
The language model has been trained on 11 key languages of the region so far, including Indonesian, Thai, Vietnamese, Filipino, Burmese, Khmer, English, Chinese, Malay, Tamil, and Lao.
“Language is an essential enabler for collaboration. By investing in talent and investing in large language AI models for regional languages, we want to foster industry collaboration across borders and drive the next wave of AI innovation in Southeast Asia,” said Dr Ong Chen Hui, Assistant Chief Executive, Biztech Group, IMDA in a press release.
If successful, such a base model could be used to power language-specific applications such as chatbots, language translation tools, content creation tools, and Internet search engines.
On SEA-LION’s use potential for regional counterparts outside of Singapore, the press release stated: “It represents a relatively inexpensive and efficient option for organisations, especially the many cost-sensitive and throughput–constrained enterprises in Southeast Asia, to incorporate AI into their workflows.”
According to the press release, the project will expand the SEA-LION model to 30-50 billion parameters in size. Currently, SEA-LION is available as a 3 billion parameter model and a 7 billion parameter model.
The number of parameters measure the LLM’s ability to understand and generate human language. For example, GPT-4, OpenAI’s newest model, has approximately 1,700 billion parameters.
While popular LLMs tend to display strong bias towards Western languages like English and French, SEA-LION currently has a vocabulary size of 256,000 across different Southeast Asian languages.
In addition, NMLP will expand SEA-LION into a multimodal speech-text model by drawing on A*STAR’s work in speech and language research. Notably, A*STAR’s multimodal speech-text can also identify non-verbal cues to offer a closer read of the user intent.
Closing the talent gap and driving industry collaborations
Beyond building the regional LLM, the programme aims to develop Singapore’s research and engineering capabilities in multi-modal LLMs by nurturing skilled AI professionals.
The programme will provide local researchers and engineers with funding and access to high-performance computing (HPC) resources through the National Supercomputing Centre.
HPC has been key to driving Singapore’s ambitions towards having the world’s largest fully automated port, reported GovInsider previously.
Funding will also be channelled to fostering collaborations with industry partners, in an effort to provide a conducive environment for developing novel AI solutions.
GovInsider earlier covered IMDA’s new initiatives to close the gap between R&D and industry.
A refreshed national AI strategy
The NMLP initiative will support Singapore’s recently announced National AI Strategy 2.0, which outlines the country’s goal to become “a global leader in choice AI areas that are economically impactful and serve the public good”.
The new strategy will take a systems-approach to achieving these goals, unlike the 2019 strategy, which focused on developing five key national AI projects. Now, the strategy will focus on enhancing three AI innovation systems, with 15 actions currently planned.
First, the strategy aims to drive AI activity by deepening technical capabilities and encouraging efforts around high-impact use cases in the economy.
Within the public sector, the government aims to develop AI strategies to meet Smart Nation priorities and use AI to enhance whole-of-government operations, like finance, HR, and service delivery. Such strategies will include providing funding, AI training, and access to HPC.
Next, the strategy aims to build AI knowledge communities through attracting AI talent, increasing the AI talent pool through training, and improving the general public’s confidence in using AI. The strategy has set a goal of creating an AI talent pool of 15,000 people.
Finally, the strategy aims to create a conducive environment for AI, such as by ensuring access to compute resources platforms and strengthening AI safety frameworks.