Summary and Impact to Clients
From 09:34am until 19:49pm on the 15th of December 2022, the SYNAQ Cloud Mail platform experienced an incident resulting in slow incoming mail delivery and webmail access for a subset of clients.
More specifically, the affected users experienced intermittent slow access to webmail services, as well as intermittent delays that impacted incoming mail delivery to inboxes.
Root Cause and Solution
The root cause of the incident was a failed write/read cache in one of the redundant RAID array controllers in the Cloud Mail storage network. This resulted in a disk I/O bottleneck affecting the mail store servers attached to that particular RAID array. This bottleneck then also caused mail delivery attempts to queue and as a result mail delivery delays were experienced for some users.
During the incident and before the final solution was implemented, SYNAQ engineers re-balanced certain disks to use alternative I/O access channels as an interim solution to optimise for the degraded performance.
To completely resolve the issue, the faulty cache component was replaced on the Monday following the incident. This returned the affected RAID storage array to optimum performance, negating the need for the interim solution originally put in place.
Remediation Actions
Short Term Actions
SYNAQ will ensure that we keep raid cache components in stock to ensure speedier component swap out (Due End January 2023).
Medium Term Actions
SYNAQ has already begun the process of moving mail store data from current storage technology to next-generation storage, which features much higher I/O capacity and fault tolerance. This project is ongoing and we estimate completion in the next six months.