Moving towards a CI/CD workflow means that we’re adding automation to our process, which is great, but we need to make sure we keep the quality of the code intact. In other words, we need to know where we should add some breakpoints to check the automated process, and make sure everything is working as it should.
To do so, we need to understand where these supposed breakpoints should be. We need to define the critical steps along our development lifecycle, understand the current process and identify where we might need an extra layer of testing.
But before we jump into the practical, let’s take a step back into the theoretical part and see why we need more than just an automated process.
— OverOps (@overopshq) December 4, 2018
Deploying Code, Faster Than Before
A CI/CD workflow includes two processes: Continuous Integration (CI) and Continuous Deployment (CD). Continuous Integration allows teams to frequently merge changes and new code into the main development branch, while Continuous Deployment ships them off to the next internal environment, or directly to production.
Companies need to move faster than before, to outrun the competition, by deploying new features and fixes on a frequent basis. And to do so, developers, DevOps and SREs use numerous tools to compile, test and deploy code, in every step of the process. On top of that, there are tools to monitor the application, servers and overall product to alert as soon as something happens. The main goal is to minimize the impact, if and when it should occur.
It’s important to remember that the biggest misconception is that the CI/CD workflow ends when we deploy new code to production. Teams tend to think that the automation stops when the code is out in the wild. Which is totally wrong.
Monitoring is an inseparable part of the CI/CD cycle, and automated deployments require smarter monitoring. You want to know when a release introduces new errors without relying on user reports, and have all the information you need to fix it.
How do companies do that?
To Each Their Own (Strategy)
Promoting code to production differs from one company to another, and each one has its own game plan and tools that they use. Some teams choose to deploy the code automatically after every approved build, while some wait for a more significant amount of changes before promoting it to production.
We need to understand that there isn’t one golden solution that fits all. We need to build a strategy across all stages of the software release cycle – from building, to testing, deploying and through to monitoring. As part of this strategy, we need to make sure everything is working at every level of the application and, hopefully, reduce the number of issues that reach production.
Atlasssian, for example, points out in a blog post that they have a strict rule to never push builds directly to production. Instead, the binaries must first go through QA on their staging servers, and only after they’re tested and approved they are pushed to production.
Netflix, who built its own Continuous Delivery platform to follow the code’s lifecycle, has a similar process. Before any line of code can reach production, Netflix teams must test it locally, commit the changes to the central git repository and then have it built, tested and packaged for deployment. Different from Atlassian, when Netflix’s code reach the final process of “baking” the builds into Amazon Machine Images, they are promoted and deployed to production automatically.
But it doesn’t matter which path we (or our company) take, one thing always stays the same: Dev, DevOps, and SREs are all working to increase their velocity to keep innovating while maintaining the reliability of the product. That’s why the most important thing we need to focus on in our cycle is Continuous Reliability.
Promoting Code Reliability to Production
Code quality gates and contextual feedback loops are the CI/CD building blocks that define the emerging practice of Continuous Reliability. They can be the difference between hoping your code will work in production to knowing it will.
For our teams and product to stay on top of the competition, we need to be aware of the limitations of the data and work towards making it contextual and code-aware from top to bottom.
We need to get access to more detailed data about our applications’ performance and reliability. We need to see what the JVM sees. With access to this data, we can use it to set up more advanced quality gates to block problematic code from passing to the next stage and feedback loops to inform more comprehensive testing scenarios.
Simply put, we need to analyze each issue, exception and error in the context of other events, applications and releases so we’ll be able to create new metadata for each one of these events.
That was one of the elements that drove us towards creating OverOps Platform, which enables OverOps to help deduplicate events, based on where in the overall application the code is being executed.
This allows DevOps and SRE teams to investigate the overall quality of an application and determine when it is safe to promote code within a fast-paced continuous integration/continuous delivery (CI/CD) workflow.
OverOps allows an organization to gain insight into numbers of new and reintroduced errors by type and for every release. And when an issue is found, embedded tinylinks that link developers to the new data provide insight into the issue so it can be remedied and code can be freed to be promoted.
This approach has two positive side effects; it improves overall performance and reduces the amount of information that has to be stored, making the entire platform more efficient. To learn more about how to safely promote code within your CI/CD pipeline, join our webinar: Code-Aware DevOps: Introducing OverOps Platform.
Each company and team have their own workflow for promoting and deploying code to production. Automation helps us along the way, making sure our code is working in a variety of environments. But in order for us to gain full visibility and make sure our code is production ready, we need to make sure it’s reliable.
Learn more about Quality Gates and Feedback Loops in your CI/CD workflow.