South Korea leverages open government data for AI development
Oleh Si Ying Thian
The government has initiated AI training initiative on open data for private sector innovation, and is exploring the use of synthetic and unstructured data to increase the value of such data sets.
Dongyub Baek, Principal Manager, South Korea's National Information Society Agency (NIA), sharing about the country's open data strategy at the International Conference on Data and Digital Governance in Deqing, China, on October 21. NIA is the government agency managing the open data project and promoting data economy in the country. Image: United Nations Department of Economic and Social Affairs (UN DESA)
In South Korea, open government data is powering artificial intelligence (AI) innovations in the private sector.
Take the case of TTCare which may be the world’s first mobile application to analyse eye and skin disease symptoms in pets.
The AI model was trained on about one million pieces of data – half of the data coming from the government-led AI Hub and the rest collected by the firm itself, according to the Korean newspaper Donga.
AI Hub is an integrated platform set up by the government to support the country’s AI infrastructure.
TTCare’s CEO Heo underlined the importance of government-led AI training data in improving the model’s ability to diagnose symptoms. The firm’s training data is currently accessible through AI Hub, and any Korean citizen can download or use it.
Pushing the boundaries of open data
Over the years, South Korea has consistently come up top in the world’s rankings for Open, Useful, and Re-usable data (OURdata) Index.
The government has been pushing the boundaries of what it can do with open data – beyond just making data usable by providing APIs. Application Programming Interfaces, or APIs, make it easier for users to tap on open government data to power their apps and services.
There is now rising interest from public sector agencies to tap on such data to train AI models, said South Korea’s National Information Society Agency (NIA)’s Principal Manager, Dongyub Baek, although this is still at an early stage.
Baek sits in NIA’s open data department, which handles policies, infrastructure such as the National Open Data Portal, as well as impact assessments of the government initiatives.
The portal was set up in 2013 as a one-stop hub for citizens to access public data sets, in different accessible formats, request data not yet disclosed, and better understand the cases and examples where open data can be used.
As of this year, there are 87,000 public data sets, 11,000 open APIs, and 61 million use cases leveraging open APIs that are available on the portal, said Baek.
He was speaking at the International Conference on Data and Digital Governance organised by the United Nations Department of Economic and Social Affairs (UN DESA) on October 19 and 21.
To subscribe to the GovInsider bulletin click here.
Building private sector capacities
At the session, Baek highlighted that the government’s agenda for providing open data is to stimulate private sector innovation.
“[Open data] serves a purpose of supporting innovations which is outlined in the ninth sustainable development goal of supporting innovations,” he explained.
To build the private sector capacity for leveraging open data for AI innovation, the government has initiated an AI training initiative through AI Hub, he said.
Private players, like TTCare, can tap on AI Hub to access training data, APIs, as well as cloud and computing resources, to develop AI offerings.
South Korea’s experience of using open data to train AI dates back to 2021, according to The Korea Herald, where the data was selectively used in eight key areas, including healthcare and autonomous vehicles.
The government also organises regular forums to involve civil society to discuss issues pertaining to open data, he added.
Synthetic data generation
Another way public sector agencies are using AI is to generate synthetic data – artificially generated data that mirrors real-world data while safeguarding individual privacy.
“Synthetic data is able to protect privacy while maintaining the statistical characteristics [of the studied population] that can enhance the value of open government data,” Baek explained.
Early this year, Seoul became the first local government in South Korea to provide synthetic data on the typical Seoul citizen lifestyle, analysing the consumption patterns and financial conditions of 7.4 million citizens.
“The local government explained that the new synthetic data could be only used for certain purposes with limited access, enabling a wider range of policy research and applications.”
“The data is expected to complement statistics covering sensitive survey items such as Statistics Korea’s Household Financial and Welfare Survey, assisting with precise analysis of financial conditions,” reported the City of Seoul's official website.
In May, the government also announced guardrails for synthetic data generation by releasing five types of reference models aimed to help researchers and companies in generating and using synthetic data for AI and machine learning development.
To subscribe to the GovInsider bulletin click here.
Not just opening data and APIs: Beyond “window dressing”
Most open data tend to come in the form of structured datasets – numbers in rows and columns – which may not be the most user-friendly.
But there is now growing interest in exploring how the government can tap into unstructured data such as images and audio, said Baek.
“Moving on, we want to tackle the issue of window dressing. When we are just opening the data, it’s not that useful to people. It’s about releasing data that meets the users’ real needs and enhances the data quality,” he explained.
To tackle the quality of data, Baek’s department also implemented a diagnostic service on the portal that provides a rating on data usability to help public sector agencies evaluate their data before they upload it.
“The service will assign a score based on five criteria, namely understandability, ease of processing, linkability, compliance and importance. With the score, they can check how useful the data are,” he explained.
Proactive collaborations needed in open data
He also shared about the importance of proactive communication with the private sector and citizens – the data users.
To facilitate communications, the government has set up an open data strategy council, co-chaired by Prime Minister Han Duck-Soo and a private sector data expert, to deliberate, coordinate, monitor and evaluate the government’s open data policies and implementation.
Half of its members come from the private sector, including established and startup companies, said Baek.
For governments looking to kickstart open data initiatives, he emphasised the importance of a robust legal framework to guide agencies to implement open data policies.
“Like many other countries, we have established several key laws, including the first Information Law, the second Open Data law [in 2013], and lastly the data-driven administration law enacted quite recently in 2020. And there is a clear distinction among these laws,” Baek noted.
The South Korean government is also currently supporting the Laos’ Ministry of Technology and Communications (MTC) on open data and Mongolia’s National Statistics Office (NSO) on capacity building and data quality management, according to the sharing by the respective government representatives.
South Korea is also ramping up its international collaborations around the data economy – its most recent partnership happening a few days ago with the European Union to promote cooperation on global data governance and trade.