Technology is always evolving, and it pays to keep up with the latest trends. I turn to education platforms to keep my web development skills sharp and learn new languages. One aspect of these sites I particularly appreciate is its gamification of the learning process.
What do I mean by gamification? The platform gives you a progress bar when you start a lesson to show you exactly how far you've come in a course (see screenshot below). Not only can I see all the lessons I've completed, but I can look at how much I've moved the meter. It inspires me to go the extra mile in each session as I know exactly how far I've got left before I have my hands on the proverbial trophy.
How to Measure DevOps Success
The goal of the DevOps model is to produce higher quality software faster and rapidly respond to changing requirements and technologies to keep the product on the leading edge. If your team achieves this vision, you are enjoying the full benefits of the DevOps methodology.
How do you know if you've reached this point? Happy customers and end users are a good sign. But even if you are satisfying your stakeholders, you may be missing other considerations — such as application performance — that may not be causing immediate issues but could in the future. We need a way to measure our key performance indicators (KPIs) to determine if we're meeting goals and identify opportunities for improvement.
Whether you are meeting your goals or still adopting a DevOps model, metrics help us uncover hidden issues and confirm the areas we're excelling in. DevOps metrics also allow us to spot declines in our KPIs early so that we can troubleshoot before customer satisfaction is impacted. The best DevOps teams receive feedback from stakeholders and quickly implement changes guided by expertise and proper planning.
What are DevOps metrics?
DevOps metrics are statistics and data points that correlate to the performance of a team's DevOps model and measure the efficiency of the process as well as reveal any areas of friction between the phases of the pipeline. Metrics are used to track progress toward achieving overarching goals decided by the team.
Now that we understand DevOps metrics, let's drill down on the eight most relevant metrics for evaluating the performance and success of your DevOps pipeline.
Important DevOps Metrics
- Lead Time for Changes
- Change Failure Rate
- Deployment Frequency
- Mean Time to Recovery
- Customer Ticket Volume
- Defect Escape Rate
- Application Performance
- Mean Time to Detection
The first four metrics in our list have been selected by the DevOps Research and Assessment (DORA) team at Google as data points of critical importance. We'll examine these and four additional metrics that will provide even greater insight into the performance of your pipeline.
1. Lead Time for Changes
Lead time for changes is the time between when new code is committed and when it's in a compiled and deployed state. In the DevOps model, rapid updates are critical to maintaining excellence and momentum, so streamlining the process for testing and merging is essential.
Teams will implement DevOps automation practices to achieve new efficiencies in the testing and compiling processes. Lead time for changes allows the team to see exactly how long it takes for code changes to enter production.
2. Change Failure Rate
Change failure rate records the percentage of updates that require immediate fixes or attention after they are deployed in production. This metric does not account for issues caught before deployment, and teams can treat this metric as a reflection of their testing system's efficacy.
In other words, a high rate of change failure may point to a gap in the DevOps testing process — on top of other potential problems occurring in production. Change failures are not ideal because it means bad code is reaching customers and ties up team resources to implement urgent fixes. This means the change failure rate is a crucial data point for evaluating the effectiveness of the team's testing practices and code quality.
3. Deployment Frequency
Deployment frequency relates to the ultimate goal of DevOps: fast creation of high-quality software. Naturally, measuring how often the team pushes new code into production shows how the pipeline is performing toward this objective. Deployment frequency provides this insight.
However, it's important to weigh this metric against the change failure rate discussed above. If your team is deploying code frequently but that code often contains bugs, then the process is not as efficient as the deployment frequency alone would suggest. Flawed deployments will also lead to lower customer satisfaction — especially if the issues impact user experience.
4. Mean Time to Recovery
Mean time to recovery (MTTR) in the DevOps model is a measure of how long it takes for a deployed application to recover from failure and return to normal operations. MTTR does not distinguish whether the service interruption results from a code deployment or a larger system failure.
An application monitoring strategy is essential for decreasing MTTR. Monitoring and logging tools will speed time to detection and may also identify the cause of the outage. One of DevOps' strengths is the collaboration between the development and operations teams, which allows for faster identification of the issue and remediation.
5. Customer Ticket Volume
Though not selected by DORA, customer ticket volume is a critical metric for measuring how well your software is meeting the needs of your end users. Feedback is essential to the DevOps model's emphasis on constant improvement, and customer tickets are one data source to focus your efforts on addressing recurring issues.
However, a high volume of customer tickets suggests that your application is not meeting your customer's central needs. Being able to quickly catalog tickets will allow you to gain a high-level picture of issues and their severity. In any scenario, lower numbers of customer issues show that your application is meeting expectations and performing as needed. Higher volumes indicate that you need to step back and reevaluate your approach.
6. Defect Escape Rate
Defect escape rate examines the rate at which code containing bugs or other flaws is pushed into production. Like change failure rate, it is another measure of the testing process's effectiveness and the quality assurance (QA) strategy overall.
However, defect escape rate exclusively focuses on the percentage of production pushes with bad code versus change failure rate that records how many of these pushes require immediate remediation. Taken together, these metrics provide greater insight into areas where the QA process can improve to catch errors earlier in the pipeline and the impact of uncaught bugs.
7. Application Performance
Application performance inspects how the application stands up to different resource demands and meets users' needs during normal and peak operating parameters. Ideally, the application has a robust foundation to respond to large requests without impacting load times for users.
Application performance will measure how the application responds to high-demand scenarios. These tests are conducted prior to deployment to confirm the software will meet defined customer requirements. After deployment, DevOps teams will continue to monitor performance to ensure the application operates as expected in production. Decreases in application performance flag potential issues to proactively assess and correct.
8. Mean Time to Detection
Mean time to detection (MTTD) is a measure of how long it takes to identify and flag an issue once it appears in production. This directly correlates to application performance monitoring as a primary way to spot problems, though customer tickets are another source of feedback.
The sooner an error is detected, the sooner the team can investigate and introduce a fix. A short MTTD shows that your team has an effective monitoring strategy and is responsive to issues as they occur. When combined with a short MTTR, your team identifies and solves problems at scale — ideally before the customer has even noticed an issue with the application.
DevOps Metrics Dashboards
You may find additional metrics that are valuable to your team's unique DevOps model beyond the eight we've covered here. So how can we organize this information into a digestible format with graphs and visual aids instead of trying to sort through spreadsheets?
In the example from earlier, the definition of success is clear and broken down into actionable steps. Progress is reflected in the course dashboard by showing the completion bar and finished lessons.
DevOps teams can take advantage of dashboards to chart their progress and better understand complicated data.
Dashboards are a great way to visualize complicated statistics and share them with a broader audience. Providing dashboards to your team allows them to participate in tracking goals and will inspire a greater sense of accountability. You can also use dashboards to show company leadership that your DevOps model is delivering on their goals.
We'll go over three different types of dashboards you can implement below.
1. Agile Project Management Dashboards
Agile project management dashboards focus on measuring and visualizing the workflows specific to moving the application forward. These dashboards will show how the team is performing against goals like deadlines and task completion. They also provide better insight for setting expectations, such as the average turnaround time for creating a particular software component.
2. Application Monitoring Dashboards
Application monitoring dashboards are dedicated to recording the performance of the application in production. This data helps establish benchmarks for normal operations, which then helps flag below average performance for further inspection.
Application monitoring dashboards can also aggregate customer tickets for patterns, record type and frequency of errors, and record MTTD and MTTR to provide a big picture view of the health of the application.
3. Platform Observation Dashboards
Platform observation dashboards provide another view of the application's health, but these dashboards focus on the entire technology stack. This scope includes the application's containers, network, and storage to measure if the underlying infrastructure is supporting the software's performance.
Insights from these dashboards help the operations team better configure the environment to enable fast server responses to customer requests and other desired behavior.
Measure twice, cut once.
I know, DevOps isn't the same as lumber, but hear me out. The adage "measure twice, cut once" alludes to a deeper approach to data: the more information we collect, the better informed our decisions will be. In practice, this can be as simple as measuring a second time to make sure you marked the correct length on your 2x4 before cutting it.
Obviously, the DevOps model is exponentially more complex than using a saw. But like our theoretical carpenter, your goal is to gather as much data from as many sources as possible to make decisions that account for the full context of your team and pipeline. By prioritizing the DevOps metrics we've covered in this post, you'll be able to reinforce your judgment calls with hard numbers and set measurable goals to drive your team forward.