As detailed in this introductory post, we will start posting monthly reviews on the performance of the e-conomic application.
Also, we will include updates on what technical improvements we’re working on to enhance performance and stability.
Uptime and incidents in June
In June, we reached an average uptime of 99.98%, putting us firmly above our stated target of 99.9% uptime.
In total, we only had 2 minutes of downtime during the month, compared to 4 minutes in the previous month.
But while uptime – the time when the e-conomic application is actually up and running – is of course essential, it doesn’t tell the whole story.
During the month, we experienced 13 incidents of perceptible slowdowns or functionality being temporarily unavailable. This is a number that we’re working to bring down, while we keep reaching our uptime target.
The incidents are linked to three types of issues:
1. Queued emails
On multiple occasions, emails were queued on the server and not sent to recipients.
We have each time resolved the issue by flipping the job servers, and the emails have then been sent within 5-15 minutes.
The underlying cause of this issue is to do with our code related to email sending. We have a project set up to remedy and improve this code – more on this in a later post as we see the results.
2. Issues related to deployments
Two of our deployments in June caused issues that forced us roll back to the previous version.
In the most serious case, the code changes meant that a server error occurred when you tried to use the import functionality. The deployment was of course rolled back as soon as we discovered this.
We take deployment issues very seriously. We have dedicated QA testers in place for each development team who test new features and check for regressions before the new code reaches production. Additionally, we have automated QA procedures that also help identifying issues.
This way, we catch the vast majority of issues before they are introduced in production.
Some issues, however, are very difficult to uncover before they actually hit production, like the two deployment issues from the past month.
In these cases, we do a post mortem to identify what we could have done to avoid the issue and establish how this knowledge can help us prevent similar issues in the future.
3. General speed and stability issues
This is not an issue that applies to the past month in particular, but a general performance issue. Particularly during peak hours, we have seen long response times and slowdowns in the application.
One major reason for this has to do with our infrastructure and back-end. We have a number of components and technologies that we want to upgrade to improve performance.
Our first step on this route is to address the way we handle session state. We have seen a number of stability and performance issues with the service and implementation we have used so far.
Until now, we have relied on a third-party provider for our session state. We are now in the process of implementing our own service which will not only improve our stability and flexibility but also allow us to upgrade and modernize our back-end during the time to come.
This week, we will be turning on our new session system, which may result in minor slowdowns as we switch systems – but we will reap the benefits later on.
After that, we will implement other minor adjustments and fixes over the coming weeks and months to ensure that our session setup is in place.
You can learn about the basics on session state here.
What’s coming next?
By the end of July, our new session setup will hopefully be far enough advanced for us to start work on other components that we would like to upgrade. More on this next month!