Cloud migration, artificial intelligence, machine learning and data analytics are buzzwords that frequently accompany the digital transformation that governments are currently undergoing. As the fundamental building block of these capabilities, data is the key to unlocking their full potential.

Governments are starting to rely on data to deliver better citizen services, inform policymaking, and manage interagency collaborations. However, just like how a Jenga tower wobbles if one of the blocks is missing, a gap in cybersecurity threatens a data-centric system.

The Covid-19 pandemic has increased such vulnerabilities. Last year, the overall number of data breaches went up more than 68 per cent from the previous year – 23 per cent higher than the all-time high set in 2017, according to a report by the non-profit Identity Theft Resource Center. Such incidents reduce confidence in embracing the shift towards digitalisation in government and among individuals.

So how can governments accelerate data flows while keeping cyberattacks at bay? James Rice, Vice President of Solution Engineering, Customer Success & Customer Support at Protegrity, explains how governments can strengthen their data protection and privacy while still securely tapping into the benefits of new cloud and data initiatives.

Securing data

“No one is sharing data just to store and set it somewhere – they’re sharing data to use it,” Rice says, adding that the value of data lies in its use to improve government functions that translate into better outcomes for citizens.

Data sharing across public agencies is critical for governments to establish common ground and coordinate whole-of-government solutions. For example, during the Covid-19 pandemic, Singapore hospitals have needed to share clinical data in the National Electronic Health Record, a centralised repository, so that the country’s health ministry can allocate resources such as vaccines and personal protective equipment effectively.

However, if data repositories or channels for data sharing lack field-level data protection, governments are at risk of leaking data that may include confidential information on citizens or public services.

It is no longer enough to secure data by relying on “coarse-grained” control – broad protections for data such as encrypting an entire database file or securing the communication protocols through which data is transferred, Rice says. Coarse-grained control protects data only at a macro  level and does not protect individual field level information nor data in use at the time of consumption.

Furthermore, coarse-grained control doesn’t provide fool-proof cybersecurity. It acts like a light switch – data is continually protected when not in use but then completely unprotected when being accessed, Rice says. When organisations encrypt an entire database, it renders the data unreadable to cyber attackers attempting to exfiltrate the entire database file. But when an authorised user logs into the database or data is accessed from a connected application, encryption is bypassed. The database administrator or business user will have full access to the data, regardless of whether it is relevant to their duties.

The “all-or-none” nature of database protection increases data security risks. If cyber attackers manage to obtain login details or internal bad actors misuse their access, they can bypass encryption instantly and gain access to a full range of data.

Tightening control

Conversely, fine-grained control is a more precise method of data protection. Instead of encrypting databases wholesale, it allows organisations to grant specific portions of data to different groups of users depending on their roles. This means that governments have more control over which data is shared and who it is shared with, Rice says.

Imagine a scenario in which you have a row of information about a citizen, including their name, identification number, email address, and credit card number. For example, the end user, who might be a customer service representative at a bank, does not require all that information to do their job.

Fine-grained control ensures that the customer service representative has access only to the data required for them to fulfil their duties. Protegrity’s “vaultless tokenisation” technology achieves this by converting irrelevant data into random strings of characters. This then allows for just-in-time data masking so that, for instance, a customer service representative sees only the last four digits of the credit card number, which is all that’s required for them to validate a customer’s identity, Rice says.

Accelerating cloud migration

Many governments, such as that of the Philippines, have adopted “cloud-first” strategies, migrating their databases to clouds for greater efficiency and speed. This is a transformative shift away from traditional on-premises databases such as bulky servers.

But governments are hesitant about making that shift too quickly. When they move data away from on-premises platforms, traditional security measures such as firewalls and intrusion detection systems do not follow the data to the cloud, Rice explains.

Furthermore, under a shared responsibility model, cloud providers are responsible only for securing the cloud, while organisations remain responsible for securing their data. As guardians of sensitive information such as citizen data, governments must ensure that this data will be secure before they can confidently migrate to the cloud.

Protegrity’s data protection and privacy platform[LME5]  provides cloud-native capabilities that help governments move to the cloud assuredly and confidently. It automatically tokenises, or encrypts, sensitive data in real time as it is sent to the cloud, ensuring that the data is constantly protected, regardless of whether it’s on the move or sitting in the cloud.

This alleviates concerns that internal security and compliance teams or external regulators may have about data security, giving public agencies a green light to make their first forays into cloud operations, Rice says.

Protegrity also streamlines data protection across different cloud providers, such as Amazon Web Service, Microsoft Azure, and Google Cloud Platform, as well as cloud repositories such as Snowflake, Databricks and many others. Data protection on the cloud is typically vendor-specific; a cloud provider’s encryption will not work on-premises or in the other cloud platforms that an agency might be using. Governments will have the hassle of “managing security vendor by vendor, system by system,” Rice says.

However, Protegrity’s data protection services safeguard organisations’ data, regardless of which cloud platform it’s stored on, allowing them to leverage the unique analytic functions and software each platform provides.

Beefing up machine learning

Another benefit of Protegrity’s vaultless tokenisation technology is that it maintains the format, length and type of the original data. A person’s date of birth will still retain the same day-month-year format after being protected, although the numbers have been randomised.

Typical encryption, conversely, breaks the format. For instance, it might turn the date into a string of letters or symbols, possibly rendering the encrypted data incompatible with existing systems. An administration system that expects input in a day-month-year format might flag such an entry as an error, Rice says.

Synthetic data is another way to retain the format of data while ensuring privacy for sensitive information. It refers to fake data generated by an AI tool that mimics real-life data. Even though the generated data is different, it will still fall into a similar range as the original data.

Not only does synthetic data help to anonymise sensitive information, but it can also train machine-learning models. Protegrity has helped organisations improve their machine-learning models by feeding in millions of realistic records based on original data.

Anonymisation is another technique Protegrity offers for ensuring data privacy. Anonymised data generalises sensitive information for security while still maintaining accuracy of the data. If an organisation merely encrypts or tokenises data, the format of the data might change completely, rendering it illegible to the machine-learning model. For example, tokenisation might turn a 45-year-old man into a 21-year-old woman in the database. This would break the machine-learning model.

“You have to make the data available to the right people and the right machines at the right time. It’s that balance – that mix of security and usability,” Rice says.