How GovTech Singapore’s StackOps enhances observability in public sector IT
Oleh Elastic
For IT leaders, security teams, and developers responsible for citizen-facing services, this solution empowers them with a single, streamlined observability platform that ensures reliable government services.
-1762133659641.jpg)
With StackOps, government agencies can search, access, and replicate observability data across different systems to build more resilient applications. Image: Canva
Major national-level events often require seamless collaboration among multiple agencies and systems. Ensuring systems work together effectively is critical in avoiding disruptions that could inconvenience the public.
A perfect example of this is during a country’s general elections, where multiple systems handle a large volume of voter transactions across various touchpoints.
In such high-pressure situations, observability is crucial for monitoring digital systems, especially to accommodate surges in user traffic and transactions.
Observability refers to the collection and analysis of data from a wide variety of sources to provide detailed insight into the behaviour of applications running in an organisation’s environments.
This capability allows issues to be proactively diagnosed, analysed, and traced back to their origins.
To subscribe to the GovInsider bulletin, click here.
Observability on a national scale
“In large-scale scenarios like the general elections, strong observability is critical,” says Hsiao Ming Chia, Director of Core Engineering Products at the Government Technology Agency of Singapore (GovTech Singapore).
“Monitoring must go beyond infrastructure and include the four golden signals: latency, traffic, errors, and saturation,” he adds.
The ability to share observability data across teams is key for quick identification and addressing of potential faults and root cause analysis, especially with increasingly interdependent applications and systems.
That is what GovTech Singapore’s StackOps brings to the table. StackOps is a monitoring tool tailored for Whole-of-Government (WOG) deployment.
It improves information technology efficiency and developer productivity with enhanced observability, data aggregation, improved logging standards, and visualisation across cloud applications and infrastructure.
During the General Elections 2025 (GE2025), StackOps enabled agencies to collaborate on monitoring data and exchange insights for faster fault-isolation and root cause analysis.
“For GE2025, we managed the monitoring dashboards in the command centre throughout the event to track voter numbers, human traffic and the health of applications and infrastructure involved in the voting process. It was relatively uneventful, but many were closely watching the rising voter registration numbers,” shares Chia.
“We cheered when the dashboards showed one million voters without any major incidents or hitches, and finally relaxed once 2.4 million voters had gone through registration, which meant that most voters had cast their votes and the systems held up.”
Accelerating detection with insights from data
Prior to StackOps, fragmented monitoring tools and a lack of monitoring data standards across government systems were the cause of several challenges, says Chia.
Without a comprehensive platform for observability, teams faced difficulties to access monitoring data which increased cognitive load for engineers and operational staff who had to juggle multiple monitoring tools during incidents, he notes.
Similarly, without monitoring data standards, teams spent precious time trying to understand and search large volumes of telemetry across different datasets to find the root cause of an incident.
By relying on basic internal infrastructure metrics like CPU, memory, or storage, without application-level and user-level insights, teams could miss issues entirely and fail to understand why a component failed unexpectedly.
“This often resulted in the ‘watermelon effect’ where internal metrics look green, but the underlying applications were in red with actual user issues,” says Chia.
Since implementing StackOps, central platforms within GovTech Singapore have observed better Mean-Time-To-Detect (MTTD), faster triage, and more accurate impact analysis when incidents occur, by having access to comprehensive monitoring data, Chia shares.
“The improved visibility through StackOps Service-Level Objectives (SLO) and Service-Level Indicators (SLI) alerts and dashboards also provided concrete evidence of system health for the engineering, operations teams and management, which increased their confidence in system reliability,” adds Chia.
“As long as we have the necessary data, monitoring dashboards can also be used to track other business metrics, besides usual observability metrics".
Chia also highlights that some government agencies have begun using StackOps to enhance their observability capabilities.
StackOps also supports observability for SG60 initiatives like the SG Culture Pass, alongside GovTech Singapore central platform products, like the APEX API gateway, GovEntry, and the Government on Commercial Cloud (GCC).
A single dashboard to observe, then act
In the world of digital government services, ignorance is not bliss: you cannot protect what you don’t see.
StackOps uses Elastic Cloud on Amazon Web Services (AWS) and Microsoft Azure to ingest data and distill insights, improving observability and making it easier for IT teams to effectively monitor their critical infrastructure and applications.
According to Elastic's Regional Vice President, Singapore, James Leong, end-to-end observability starts with telemetry data, like logs, metrics, and traces.
He notes that Elastic Observability empowers IT teams in implementing AIOps, which uses artificial intelligence (AI), machine learning (ML), and big data analytics to scan petabytes of telemetry data to identify anomalies and suggest proactive solutions in seconds instead of minutes, before services are affected.
“This visibility allows application and operations teams to understand the internal state of systems, diagnose issues effectively, and ensure high operational performance,” Leong says.
Additionally, the Elasticsearch Platform ingests and analyses vast amounts of unstructured and structured data in real time which enables teams to quickly retrieve relevant answers from exponentially growing datasets, notes Leong.
“With Elastic, agencies can aggregate data from across their IT infrastructure on centralised dashboards for faster troubleshooting, while maintaining full control over their data through approval workflows that ensure privacy, security, and compliance standards are met,” Leong adds.
To subscribe to the GovInsider bulletin, click here.
Collaborative data monitoring
Sharing information and best practices between and within organisations helps to avoid unplanned downtime. With StackOps, government agencies can search, access, and replicate observability data across different systems to build more resilient applications.
The benefit from being able to search and learn from each other’s monitoring experiences, Leong notes, is that agencies can use this data to identify how the performance of a service impacts another service.
“With greater visibility, systems have increased continuity, with backups and disaster recovery safeguards in place.
"Agencies can create shared dashboards for a comprehensive view of their digital health, monitoring events, incidents, and dependencies spanning multiple agencies for collaborative incident response and problem-solving,” says Leong.
According to Chia: “StackOps helps to break down silos within and across IT teams and agencies through sharing of monitoring insights, while ensuring access controls and compliance. With this data on hand, teams are better equipped to build resilient, reliable, and scalable applications.
“This aligns with DevOps principle of enhanced collaboration and continuous improvement based on feedback. When systems rely on each other, sharing monitoring insights also builds accountability and transparency across teams and agencies. Of course, tools are just one part - having the right mindset and culture are equally important,” Chia concludes.