ClayHR customers in North America and South America regions experienced application outage due to application started returning increased 504 Gateway Timeout errors.
ClayHR team identified that a sudden spike in incoming requests had overloaded the existing instances behind the Application Load Balancer, causing high response times and auto scaling failed to automatically replace these unhealthy instances.
Unhealthy instances were manually rebotted.
Total downtime of 39 minutes was experienced for selected customers in North America and South American regions.