Three steps for trusted data at scale

By Talend

David Talaga, Talend’s Director of Product Marketing for Data Governance and Data Quality, discusses.

Nearly 16,000 Covid-19 cases were left out of the UK’s official daily figures in October due to a row limit on Microsoft Excel. This error meant that anyone who came into contact with these individuals was not informed - a potentially deadly mistake.

Inaccurate data has far-reaching consequences beyond faulty services - “the trust that you put into the service is also broken,” says David Talaga, Talend’s Director of Product Marketing for Data Governance and Data Quality.

“As a government body, providing the best citizen experience is just the right thing to do. And the truth is that you cannot have a good citizen experience without good data,” he adds. Talaga shares three steps for trusted data.

1. Put the right controls in place

Data quality problems can start with simple mistakes but have far-reaching consequences. “If a health agency uses the wrong social security number for the reimbursement of medical expenses, or an energy supplier bills you for the electricity bill of your neighbor's, how would you react?” Talaga asks.

These errors are more common than we think, he adds. Multiple data sources exist across various agencies, and the need to protect personal data is often not systematically applied, says Talaga.

There is also a common misconception that “the more data you have, the better it is,” says Talaga. That has encouraged people and public services to share more data. “But what good is it if the data is not accurate or ready to use?” That would only amplify existing errors.

Organisations need to enforce the right quality indicators from the beginning, he says. “When you start integrating too many fragmented sources of data, you will end up with data that is not under your control.”

To implement standards for data quality, a “common understanding of existing data quality” is required. From there, agencies can implement supporting organisational structures such as designated data roles (e.g., Chief Data Officers) and data stewards, who would implement the relevant data policies accordingly.

2. Democratise trusted data

“If you want to deliver trusted data, you need to democratise the way data quality is shared,” says Talaga.

As such, agencies need two essential things. First, a collaborative data platform that allows stakeholders to work on, rate, tag, endorse, and certify the right datasets. Second, clear, understandable and explainable quality indicators to track improvements. Both would ensure that employees without high levels of data literacy can collaborate with data engineers to contribute to the value of data, he says.

The Talend Trust ScoreTM is a unified indicator that measures the health of data across the enterprise. It is a combination of data quality, data popularity, and user-defined ratings that reveals how healthy data is, he adds. “This indicator can be communicated to non-technical people because it's easy to use, understand, and share.”

To embody trust, the score must reflect not only the intrinsic quality of the data but also the confidence that experts have in the data. For example, a dataset that is not endorsed by employees or because it integrates an inaccurate or unnecessary data source will have a lower score.

But a score in itself is useless if we don't understand how it’s calculated and how to improve it,. he says. The Talend Data Fabric allows anyone to understand the problems in the data and correct, control, and federate the data in question,” Talaga adds.

Aeroporti di Roma (ADR), Rome’s international airport, selected Humanativa Group SpA as a partner with Talend to collect, connect, cleanse, and govern trusted data to transform its passengers’ airport experience. ADR was using Excel spreadsheets and a lot of manpower to analyse its data - a time-consuming process that required the commitment of different data owners. Predictions were not always accurate as well.

With Talend, ADR implemented self-service and this has helped the airport understand and respond to unexpected issues quickly without having to wait for external contributors to be available.

3. Scale human expertise

Once the people with the right expertise are identified, organisations must maximise their efficiency and not spend “too much of their time” verifying data, says Talaga. The success relies on an organisation’s ability to empower people with the right expertise on the right issue at the right time. This requires both ease of use in validation tools and the automation of stewardship tasks.

Machine learning algorithms within Talend’s Data Fabric learn from the data analyst to apply corrections and quality checks to millions of data sets. “Our systems will recognise the correction that you're doing, and suggest applying these corrections to other data sets,.” he says.

To drive a data culture within an organisation, leaders need to “start small with a team of committed and motivated people”, says Talaga. This team then needs to identify where data is the most painful for an organisation, and use a data platform for that specific use case, he adds.

“Once you have validated that, you can build your success on your first success,” he says.

Trust will be the bedrock of citizen services as data continues to power decision-making and predictions. Building data quality standards and creating the right culture will be key.

Find out how your agency can ensure the quality and accuracy of data by downloading Talend’s guide here.