Incident Date/Time (AEST):
05/10/25 2:18pm
Services Impacted:
Visualcare Worker Mobile App (vWorker)
Duration:
3hrs 58min
Customer Impact:
Customers were unable to connect to or synchronise data with the Visualcare Worker Mobile App during the outage period. All inbound traffic to the Mobile API was unavailable until service restoration.
Summary
Between
2:18 PM and 6:06 PM AEST
, the Visualcare Mobile API experienced a full-service interruption. During this period, users of the Visualcare Worker Mobile App were unable to connect or synchronise due to a fault in one of the mobile infrastructure’s internal routing components.
The disruption followed an
automated system update
within Visualcare’s
private cloud environment
, which affected the connection between internal systems responsible for routing encrypted traffic. Because all communication inside Visualcare’s platform enforces strict encryption and validation, affected traffic was safely rejected by policy rather than being exposed or processed insecurely.
The issue was resolved through the deployment of a
new Application Load Balancer (ALB)
built on Visualcare’s current hardened infrastructure standard. The replacement restored normal mobile API traffic while maintaining the same high-security posture and encryption enforcement policies.
Root Cause
An automated infrastructure update applied to a legacy routing component within Visualcare’s
private cloud
introduced a mismatch in how secure internal connections were validated.
As part of Visualcare’s defence-in-depth architecture,
all traffic - external and internal - is required to meet strict encryption and certificate validation standards.
Following the update, internal traffic between the legacy load balancer and the primary API tier was incorrectly classified as non-secure. In accordance with Visualcare’s security enforcement policies, these connections were automatically rejected, resulting in a complete interruption of inbound Mobile API traffic.
The affected component resided within a legacy configuration pending decommission and did not include the modern validation logic applied to current load balancer standards.
Contributing Factors:
  • Legacy routing layer within the private cloud pending full migration
  • Automated configuration update altering internal TLS validation behaviour
  • Enforcement of strict encryption requirements rejecting internally misclassified traffic
Network Traffic (bytes per sec) – Primary API Load Balancer
The graph below shows the network activity for the primary Mobile API load balancer.
  • Point 1 – Automated Update: The automated system update was applied to the legacy load balancer, after which internal traffic between private-cloud components dropped to zero as non-compliant (non-secure) connections were automatically rejected.
  • Point 2 – Validation & Testing: A new Application Load Balancer (ALB) was provisioned and validated within the private cloud. Limited internal testing traffic is visible during this window.
  • Point 3 – Production Deployment: The new ALB was promoted to production, restoring normal internal network throughput and completing the incident resolution.
Screenshot 2025-10-14 110155
Disk Latency (by device) & Disk I/O Activity
The combined disk latency and I/O graphs show a correlated spike at the time of the automated update. This behaviour reflects short-lived internal retries as the legacy load balancer attempted to establish new sessions with the primary API infrastructure after its connections were rejected. Because all traffic operated within the private cloud and remained encrypted, these retries generated transient increases in read/write activity and latency.
Performance stabilised immediately following the deployment of the new ALB, returning both latency and I/O to normal baseline levels.
Screenshot 2025-10-14 110313
Disk Latency and IO Spike from automated update and high rejection rate
Corrective Actions Taken & Timeline
- 2:18 PM -
Automated system update applied; primary Mobile API endpoint stops accepting secure traffic.
- 2:37 PM -
First customer issue reported.
- 3:18 PM -
P0 incident declared; Engineering initiated Incident Response Plan (IRP).
- 3:18 – 3:45 PM -
Primary Mobile API server identified as fault source; services and host restarted without success.
- 3:45 – 4:30 PM -
Automated update confirmed as root cause; rollback attempted.
- 4:30 – 4:45 PM -
Rollback completed but issue persisted; replacement of ALB authorised.
- 6:00 PM -
New ALB built, tested, and promoted to production.
- 6:06 PM -
Full traffic restoration confirmed; incident declared resolved.
Preventative Measures
- Legacy Infrastructure Decommissioned -
The affected legacy load balancing stack has been fully decommissioned and replaced with the standardised, security-validated ALB configuration driven by our automated infrastructure provisioning.
- Full Application Load Balancer Review -
A comprehensive review of all Application Load Balancer configurations has been completed across environments to ensure alignment with Visualcare’s current security and routing standards.
- Monitoring Improvements –
New monitoring rules have been implemented to detect and alert unusual patterns of internal connection rejections within our private cloud infrastructure. These rules specifically track attempted connections between the load balancer and primary API tier that fail security validation, enabling earlier detection of similar anomalies.
Uptime Metrics
The list below shows the total outage time and uptime for each primary service in 2025.
Mobile Services
  • Total Outage Time
    : 436 minutes (7.3 hours)
  • Annual Uptime %
    : 99.92%
Web Application
  • Total Outage Time
    : 305 minutes (5.1 hours)
  • Annual Uptime %
    : 99.94%
Overall Platform Availability
  • Total Outage Time
    : -
  • Annual Uptime %
    : 99.89%