Performance Engineering — The Reliability Edition

4 min readFeb 27, 2022

To better understand and define reliability through the use of performance methodologies and principles

Question

Can we improve the reliability of a system by employing various performance engineering techniques to different stages of the development process?

This is a look at how a solid Performance Engineering strategy that uses Reliability principles and DevOps idealisms to complement and strengthen current or proposed performance initiatives
These approaches attempt to achieve better business cohesion, reliability and velocity benefits. To do this we can look at applying various methodologies from Performance Engineering using a Shift left and Move Right approaches that extend Traditional Performance Testing techniques

At Its Core, to Understand an Application’s Performance We Need

A mechanism to run load against an application or system
A way of measuring how they performed
A way of comparing the results against what we believe is the ideal state

Each area of performance within the DevOps model has its part to play. That is, they all relate in some shape or form to the principles around building, defining and maintaining a reliable system

In a Nutshell

Each Performance execution and analysis piece should look to be guided by the Engineering Efficiency, DevOps and Reliability principles that apply to software development

Reliability Engineering(RE) attempts to predict and prevent the risk of there being a failure whether that be a component or an entire system of services
Performance Engineering(PE) states we should start earlier in the SDLC to get faster feedback, but also extends into Operations and Support to use real world data to build/update of the performance models (scripts and analysis)
Performance Testing (PT) is all about determining what the performance of an application is (baselining) or comparing to how you believe it should be(delta analysis) under various conditions and situations in the ‘test’ environment

A Look at Performance Engineering

PE looks incorporate the methodologies of ‘Agile’ and use these in conjunction with ‘DevOps’ idealisms in order to provide a improved approach that adds value rather than one that tends to hinder delivery velocity

We can do this by looking at adopting a left shift / move right approach that incorporates a cloud first performance automation approach. This can then lead to reduced feedback cycle (velocity increase) and bottlenecks / bugs being caught early on (reliability increase).

The Performance Engineering Model

PE is all about applying process and strategies at each step of the SDLC, the following are example actions/options that can be applied within each vertical

The idea being that performance is a consideration at each step in the software lifecycle, The captured metrics are gathered from Dev, Test, Deploy and Operations and used to refine the next cycle of performance

Traditional Performance Testing

Quite often done within the “test phase” and entails a big bang approach that consists of many pods/VM’s to generate load against an application/system

Performance/Reliability Options to Improve Efficiencies, Engagement and Observability

Shift Left Approach

Reducing the SDLC feedback loop to uncover and rectify potential system and environment issues early

Shift Left Benefits

Move Right Approach

A “Move Right” approach extends testing out to include user feedback and metrics from your production environment. This can then be used to update the performance model that’s developed as a consequence

Move Right Benefits

Measurements and Observability

The use of performance metrics from each environment (Dev/Test/Prod) are used to determine whether they are within SLO’s limits.

Idea being we can understand and easily record local (component) and integrated(end 2 end) metrics to provide better performance transparency. These then would be compared to ideal state

These SLO’s can be enforced through the use of SLI’s (SLI specifications and SLI implementations) and compared to our error budget to measure tolerance

With the view to obtain an current state view of our applications performance in each environment and at each stage of the SDLC these are then compared against our business performance exceptions defined in the SLO and enforced in the SLI

Performance Sli Implementations Could Include:

API / UI response times
DB transaction times
Pod / VM scaling events
CPU use / Network activity / Memory usage

Could all be defined and compared using SLI’s

A subset of the performance suite can be used to poke test (performance smoke test) the application after deployment. A degraded Performance run could then trigger a rollback

Summary

A balanced performance strategy that is applied at each stage of the SDLC, that uses guidance from RE principles provides a more well rounded verification process and in turn lead to a culture of empathy, encourage collaboration, reduce delivery cycle duration and mitigate the chance of deploying underperforming software