Incident Date/Time (AEST):
26/08/25 1:42 pm
Services Impacted:
Visualcare Web App (vCore)
Duration:
2hrs 30m
Customer Impact:
Customers experienced severe slowdowns and difficulty logging into the Visualcare Application.
Summary
Between 1:42pm AEST and 4:12pm AEST, customers were unable to reliably access the Visualcare Web Application (vCore). The issue was traced to a deadlock condition in the API tier that rendered multiple servers unresponsive after a sudden surge in web traffic.
Root Cause
  • At 1:42:10 pm, a sudden spike in inbound web traffic occurred
  • This surge triggered multiple simultaneous queries against several customer audit databases within the same second, consuming 16x available virtual CPU capacity.
  • These concurrent requests created a deadlock condition on the API servers, where processes were waiting on each other indefinitely.
  • As a result, all active API requests stalled and the blocked threads quickly consumed available memory, compounding the slowdown and preventing new requests from being served.
  • At 1:42:20 pm, multiple API servers became unresponsive due to memory exhaustion and max CPU usage
Global RAM Spike across API infrastructure, UTC+9:30
Picture3
Single Process Spike across API infrastructure, UTC+9:30
Picture1
Corrective Actions Taken & Timeline
  • 1:42 PM – Traffic surge and deadlock triggered.
  • 1:43 PM – Primary API infrastructure entered ANR (Application Not Responding) status.
  • 1:44 PM – Traffic automatically diverted to secondary API infrastructure, which was unable to cope with the load.
  • 1:50 PM – First reports of application slowdowns and unresponsive behaviour.
  • 1:55 PM – Escalation to Engineering & CTO on application slowdown.
  • 2:05 PM – P1 declared; Engineering initiated Incident Response Plan (IRP) process.
  • 2:15 PM – Primary API infrastructure identified as the fault cause; manual corrective actions were attempted to avoid application restart.
  • 2:20 PM – Primary API infrastructure refused internal connections; restart initiated.
  • 2:28 PM – Primary API infrastructure restart complete.
  • 2:31 PM – Application returned to normal behaviour; 15-minute cooldown (stability observation period) initiated.
  • 2:46 PM – New reports of performance issues; Engineering resumed Incident Response Plan (IRP).
  • 2:57 PM – Secondary API infrastructure identified as the fault cause; manual corrective actions were attempted to avoid an application restart.
  • 3:10 PM – Primary database cleared of rogue processes.
  • 3:27 PM – Secondary API infrastructure refused internal connections; restart initiated.
  • 3:38 PM – Secondary API infrastructure restart complete.
  • 3:43 PM – Application returned to normal behaviour; 15-minute cooldown (stability observation period) initiated.
  • 3:58 PM – Application remained choppy under high traffic, though both API infrastructures remained stable.
  • 4:12 PM – Application performance returned to an acceptable range as traffic normalised.
  • 4:20 PM – P1 declared complete; Engineering began RCA.
Preventative Measures
  • Increased Resources: We have scheduled an increase to available resources on the primary audit database infrastructure to better absorb future traffic surges. This change will be applied during the next database maintenance window.
  • Audit Functionality Improvements: We have reviewed our audit functionality and prepared changes to reduce conditions that could cause locking or deadlocks under high concurrency. These improvements will be deployed during the next product release window.
  • Logical Timeouts: We are actively reviewing logical timeout configurations on both the primary and secondary API infrastructures to prevent long-running requests from blocking traffic.
  • Infrastructure Capacity: We are evaluating the capacity of both primary and secondary API infrastructure to determine if scaling adjustments are required.
  • Traffic Management: We are reviewing request throttling and back-pressure mechanisms in the API load balancer to reduce server saturation during unexpected traffic spikes