We have been asked this question many times over the past 12 months – whenever we have experienced slowdowns with the application. The question has often been accompanied by the remark – why don’t you just get some bigger boxes (scale up)?
Well, it would be great if the purchase of new and bigger hardware could solve all performance challenges – in fact, however, this would only solve a few of the challenges that we face in creating a stable and high performance application for our customers.
Picture the following scenario
A big new store with many nice items on sale and with large discounts has just opened up near where you live. The store is the size of a dozen football fields – but the architect has been told to add only one entrance for security reasons. This entrance is only 1 meter wide.
As you can imagine, when people see the ads for the new store, they all head for it – within a few minutes there will be a big queue outside waiting to get in. So even though the store is the size of many footballs fields, the customers won’t be able to benefit from the size of the store before they are inside. Once inside, they can quickly locate the items that they like and proceed to checkout, but they then have to get out through the one door in the store with all the items while the people outside are trying to get inside.
In short, new and bigger boxes don’t do it alone!
Another question we also hear – if social media sites can handle millions of users why don’t you do what they do?
We could in fact do that, but none of our customers or regulators would most likely be happy if invoices entered into e-conomic for some reason disappeared. In other words we work under rules & regulations where data integrity is as important as speed. This results in some challenges…
Then what are we doing – nothing?
We are adding more and bigger boxes to our setup. Some of the boxes we can add as we go along without disturbing our customers. For other boxes, we have to create very detailed plans for how to implement them in our infrastructure. These other boxes often relate to core services in our infrastructure. Planning takes time.
Because when we add new core servers to our service application, we would like to ensure that our customers are inconvenienced as little as possible in terms of downtime.
Over the past couple of months, we have been working on improving our application infrastructure – from gathering everything in one data center to distributing across two data centers and having both data centers online at the same time – all efforts to scale out.
If one data center for some reason loses power, or its internet connection is interrupted, we could continue to deliver our service to you with as little downtime as possible – we are talking minutes instead of hours.
What remains for us to do now is to move our core servers. Typically, moving core servers into a distributed setup would mean up to 48 hours of downtime before everything is in sync, considering the amount of data we would have to sync up. As this is a very long period for our service/application to be offline, we have decided to replace and upgrade these core servers with both faster and more powerful CPU and more memory to also optimize the operative system for the new hardware.
Before the end of this year, we want to be up and running in a dual data center setup. The setup’s infrastructure is designed to support our service by allowing us to scale both out and up.