Singapore has a vision for its government to be “data-driven to the core”, said Quek Su Lynn, Director of the Government Data Office.
Being data-driven requires issues of data quality, trust, and timeliness to be resolved in a holistic manner, says Stu Garrow, Senior Vice President and General Manager Asia Pacific at Talend.
Doing it piecemeal will only create more governance silos, he adds. Garrow discusses how that can be achieved with three fundamental ingredients.
1. Balance quality and timeliness
Before Covid-19, some government agencies in the region had a culture of not releasing data until it was “near perfect”, Garrow says. “But what they were finding was people were just clamoring for data early on.” These users then turned to secondary sources of data, which may not be as reliable, he adds.
These agencies focus too much on quality, but not enough on the time aspect, he says. There’s a need for “a much bigger balance” – as the longer data exists, the lower its value, Garrow adds.
If data is delivered too late, people either make decisions based on “gut feel” or inaccurate data. Otherwise, the decision is delayed, he says. “Any decision left too late, even if it’s based on perfect data, may not be a valuable decision.”
The Talend Trust Score can help organisations decide when to release data. The metric measures a combination of data quality, data popularity, and user-defined ratings – and a minimum score can be set to determine when data should be released into the hands of decision makers.
“We all want great data, we all want perfection – but not if it comes too slowly. It has to be delivered fast enough to deliver great experiences and make great decisions,” Garrow says.
2. Trace it back to the source
Citizen data today is stored in many agencies and systems – in the private, government, or public cloud, or on premise, says Garrow.
Decision-making requires aggregating data from many different sources, he adds. There is no longer a “one-to-one correlation” between the final product and where it came from.
Agencies need to be able to solve the “complex problem” of undoing all the transformations and tracing each dataset back to its source, Garrow says. Data engineers can then verify if the source is reliable – if it came from a high trust system or a social media feed – and ensure the final product is accurate, he adds.
“Without a data fabric to manage data pipelines from end-to-end, there’ll be a lot of these air gaps, where you can’t actually undo those transformations to understand where data comes from, and where it’s being consumed,” he says.
Talend’s Data Fabric acts as a centralised platform to help data engineers understand where data comes from, and easily manage multiple sources of data.
3. Establish quality rules
Integrating data from hundreds of different systems is similar to piecing together “different parts of the same puzzle,” Garrow says. Each system may have different formats for recording data – some data fields may even be empty.
Every dataset must be processed with the same quality rules, he adds. “Where we’ve seen it go wrong is agencies having different technologies that don’t have consistent quality rules.”
A data fabric with built-in quality rules can automate this cleaning process, he adds.
Machine learning algorithms within Talend’s Data Fabric learn from data analysts to apply corrections and quality checks to millions of data sets.
Data engineers can then focus on “higher value tasks associated with data, as opposed to doing continuous or repetitive tasks that can be automated,” Garrow says.
Data holds a wealth of insights for agencies to create better citizen services. Having the right technology and culture will create trusted and timely data that governments can make decisions upon.
Find out how your agency can embrace trusted data in Talend’s blog here.