Last Monday, September 14, the e-conomic application (including the SOAP API) was down or experienced severe slowdowns for approx. 1.5 hours during the evening. We have now concluded our investigation of the incident, and you can see our findings below.
What happened?
Between 7:00 and 7:30 PM, we saw long blocking queues on the session state servers. At the same time, our monitoring services also reported performance issues. From around 7:30, users started to feel the effects of the issue as they were either unable to log in or got very slow responses, if any, when using the application.
We immediately started to investigate the issue and tried switching the session server to the secondary server, but with no positive effect on performance.
Next, we shut down the API zone in the load balancer in an attempt to ease the load on the session state servers. Shortly after, we shut down all traffic from the load balancer. This led to a reduction on the load on the session state servers from approx. 50% to 10%.
Finally, at around 8:45, we were able to restart our app servers, and shortly before 9:00 the application state was back to normal.
Root cause
As was the case during August, our session state setup was the main cause of the issue. The session state database deadlocked because it couldn’t keep up with the number of requests. This in turn caused new requests to pile up while waiting for the deadlocked requests to finish.
Eventually, all servers were handling requests that were waiting for the lock to be released and the servers were unable to process any more requests.
Normally, these locks would be released automatically, but in this case new locks were created so fast that the cleanup job could not keep up.
Future actions
We will continue with our ongoing task of improving our session state service – more details on our plans for session state here. As part of this, we will upgrade the session state database to a faster version, which requires us to shut down the application.
This upgrade will take place on September 26, 6 PM until September 27, 6 AM, which means that the e-conomic application will be unavailable during this time.