This is the third and final installment of our DevOps series and covers the DevOps process from monitoring to taking action, as well as some of the things we’ve learned along the way, plus ways for going forward. If you haven’t done so already, check out the first part and the second part of the series here.
Monitor & Measure
Traditionally, once deployment has been completed, the development department’s job is done, and the operations team must handle any production issues. At e-conomic, however, monitoring the production environment is an active task shared by both operations and development. They communicate and collaborate on monitoring and evaluating data and, if necessary, taking action based on the data.
We monitor and measure a variety of different parameters to get a complete picture how our system is performing in order to fix any issues as appropriately and quickly as possible. Our monitoring is performed on three levels:
- Business metrics, e.g. signups, current users, invoices sent, entries created
- Service level, e.g. availability, response time, slowdowns, utilization
- Technical metrics, e.g. CPU load, connections, app servers, failing tests, errors in production
The complementary part of monitoring and measuring is of course taking action based on the collected data. Depending on which metric is out of bounds, we have procedures in place for handling the issues. In case of specific code/build issues, we can fix the broken build, flip a problematic feature or apply a patch. If the issues are more system-wide, we may also choose to roll back the latest deployment and restore an older version. And in case the issues are performance-related, we can choose to provision more resources in the form of CPU, storage or servers.
All of these actions are performed through collaboration between development and operations. Additionally, we have a small task-force made up of members of our development, operations and support departments that handles both build-related and performance-related issues and also solves bugs reported by customers. Again, this means that people from all parts of the organization are close to our customers and are experienced in addressing their needs.
Monitoring metrics at e-conomic
What we’ve learned along the way
Our transition to the DevOps mindset has of course not been quite as smooth as it may appear here. Below are listed some of the main findings that we only learned along the way, and which may be useful for others to be aware of before they start on the DevOps adoption process:
- Get the version control system and branching structure in place early
- Test-driven development (TDD) is not enough. We also need to test from the outside, since only build-compiling can still leave us with faulty code and slow servers
- Keep the check-in build fast
- Enable small independent releases and use feature flipping wherever possible
- Make sure the safety net is in place –prepare for what to do with errors in production
- Failing auto tests corrupts the faith in the system
- Do in-sprint QA
- Schedule frequent stakeholder reviews to align expectations between development and the rest of the organization
- When things start to work, keep an eye on infrastructure (e.g. server load etc.)
With continuous integration now an essential part of the way we work, we’d like to move towards a continuous deployment process. As mentioned above, this would mean that features will be deployed as soon as they are completed, enabling an even shorter time-frame between when features are developed and when they are made available to our users.
We also want to move towards circles of deployments, allowing us to roll out features to certain users first in a standardized way. This way, we will be able to test and get user feedback on new features before we decide whether to make them available for all users. This ties in well with the overall movement in our products towards more user-defined setups with options to add e.g. additional modules, extra tabs and integration apps.