Steve Oyugi – Software Engineering Blog

I have received a query recently on API Performance testing and I thought it would be a great idea to document it for future reference. The API to be tested is a public-facing API built on Laravel. You can use the examples to test any kind of web-based API.

I will cover why we may need to consider performance testing, and what we should consider to get good results and then run through practical examples to see everything in action.

My tool of choice for these tests is k6. k6 is an open-source tool from Grafana Labs, from the same guys behind popular visualization and operation tools. I prefer k6 because of its simplicity and developer focus. You can also stream back your results to Grafana/Prometheus.

There are other popular tools like jMeter, but it is not the easiest tool to setup and operate.

What is performance Testing? Why do we need to run performance tests?

API Performance Testing is the process of putting our API resources under simulated stress to determine if they will continue to operate optimally in the event of such stress factors.

These tests can be done periodically, like in the past or some selected subtests can be performed retroactively during some changes such as code updates. In this age, I would reckon that you run your tests sooner rather than later.

Performance tests can help us determine when/if we have non-performant code, infrastructure limitations, bugs in code, memory leaks, and scaling limits.

These are the tests we will be running:

Smoke Test – this is a simple test used to identify any bugs and regressions. It runs under minimal load for a very short period of time. This test can be easily included with commits. We will create a script called smoke-test.js for this test.
Load Test – this test ramps up performance to what we would consider normal operating conditions. From a systems architecture perspective you would already have a good understanding of what your average performance would be on any given day. We would then use this as a goal and test against it. We will create a script called load-test.js for this test.
Stress Test – this test moves things a notch up. The system will be tested to determine what its limits are. The information you get from this will help you understand what kind of infrastructure setup you should consider when you are under abnormal load. We will create a script called stress-test.js for this test.
Spike Test – this test is similar to stress test but instead of focusing on ramping up slowly, we consider putting an excessive demand on the API and scaling it up within a very short period of time. You would do this to determine how your API will behave in instances when you suddenly have a flurry of requests; could be from a random abusive scraper or a marketing drive. We will create a script called spike-test.js for this test.
Soak Test – this last test is used to figure out underlying infrastructure issues like memory leaks and less apparent bugs by running the test over a longer period of time. We will create a script called soak-test.js for this test.

Getting Started

You need to install k6 first. My test rig is a simple machine running Ubuntu. My API is running on a Raspberry Pi with Ubuntu 22.04.2 LTS. I expect to see some failures because of this, but I am open to surprises!

To set up k6 just run these commands in terminal:

Debian/Ubuntu

sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6

Head over to this installation page if you have a different test environment.

Running Tests

Let us begin by creating a folder called tests, and that is where will write and run all our scripts from.

mkdir tests && cd tests

Basic Test

Let us start by running a basic example to see if everything works out of the box! Create this javascript file in your tests folder: basic-run.js and copy this code to it.

import http from 'k6/http';
import { sleep } from 'k6';
export default function() {
	http.get('http://192.168.1.86/api/posts');
	sleep(1);
}

As you can probably tell, we are including a pause between requests to the endpoint that last 1 second. We are trying to simulate normal human behavior.

Type this in your terminal and press enter:

k6 run basic-run.js

After a few seconds, you will get this kind of output displayed in the terminal:

Looks like the Pi is holding up.

Here is a quick breakdown of what this means:

We are running 1 virtual user or executing 1 thread, denoted by “1 max VUs”
http_req_duration: the response time of the API
https_reqs: the requests that were completed. You will get a value of 1 because we are running 1 user for 1 second
Most of the results include a breakdown in average, min, median, max, 90th and 95th percentile. This should give you a clearer indication of how the API is performing.

Smoke Test

So it should already be apparent that we will be duplicating some code in the next set of scripts. To reduce this we can create a simple config.js file to keep some of the repeated details for us. Copy this code to the file. Please edit this to meet your needs, by specifying your endpoint.

const API_BASE_URL = 'http://192.168.1.86/api/';
const API_QUERY_URL = API_BASE_URL + 'posts?';
export { API_QUERY_URL };

The next thing is to create the smoke-test.js file and copy this code to it:

import http from 'k6/http';
import { sleep } from 'k6';
import * as config from './config.js';

export const options = {
    vus: 1,
    duration: '1m',

    thresholds: {
        http_req_duration: ['p(95)<1000']
    },
};

export default function () {
    http.get(config.API_QUERY_URL);
    sleep(1);
}

This script includes the options constant, wherein we define the vus, the minimum test run duration and a single threshold for http_req_duration. You can specify metrics that you use to determine a pass/fail criteria for the test, in this case we are testing to confirm that at least 95% of the queries should get a response in less than 1 second.

To run the test type this in terminal:

k6 run smoke-test.js

Looks like the Pi is still holding up. The 95th percentile for the http_re_duration threshold is met, coming in at 151.47ms. That is decent for our small API.

Load Test

Things change now, with some more pressure expected for the API. If you have previous assumptions of users that you expect to have in the week you can use them.

In my case I will go with a small target of 5 users and then ramp up to 20 users.

Create the file load-test.js in your tests folder and copy this code to it:

import http from 'k6/http';
import { sleep } from 'k6';
import * as config from './config.js';

export const options = {
    stages: [
        { duration: '5s', target: 5 },
        { duration: '30s', target: 5 },
        { duration: '5s', target: 20 },
        { duration: '30s', target: 20 },
        { duration: '5s', target: 5 },
        { duration: '30s', target: 5 },
        { duration: '5s', target: 0 },
    ],
    thresholds: {
        http_req_duration: ['p(95)<600'],
    },
};

export default () => {
    http.get(config.API_QUERY_URL);
    sleep(1);
};

Explanation of this test:

Within 5 seconds we will ramp up to 5 users, and stay at 5 users for 30 seconds
We will then scale this to 10 users within 5 seconds and hold this number of users for 30 seconds.
We will then scale down to 5 users over a period of 5 seconds and hold that for 30 seconds.
Eventually, we will scale down to 0 users over 5 seconds.

There is a caveat with this test. In a normal scenario, you would want to ensure each stage lasts at least 1 minute and that the total test time is between 30 and 60 minutes. We are demonstrating this capability.

The test still passes, we are reporting 267.10ms for the threshold!

How about we add some more stress then?

Stress Test

We now want to bring this API to its knees, if we can. One way to think about this is to consider how your API service will behave if you release new marketing material on your website and everybody suddenly wants to take a look at a product you are promoting.

It is also important to acknowledge that determining the point at which your API breaks can take a bit of work. We will start by increasing our load conditions and reducing the threshold duration.

Create the stress-test.js script and run it:

import http from 'k6/http';
import { sleep } from 'k6';
import * as config from './config.js';

export const options = {
    stages: [
        { duration: '5s', target: 5 },
        { duration: '30s', target: 5 },
        { duration: '5s', target: 20 },
        { duration: '30s', target: 20 },
        { duration: '5s', target: 100 },
        { duration: '30s', target: 100 },
        { duration: '5s', target: 200 },
        { duration: '30s', target: 200 },
        { duration: '5s', target: 0 },
    ],
    thresholds: {
        http_req_duration: ['p(95)<600'],
    },
};

export default () => {
    http.get(config.API_QUERY_URL);
    sleep(1);
};

You will realize that we will go all the way to 200 users at Max load.

Our API starts to fail. The 95th percentile is at 16.95 seconds now. Looks like we have a performance problem to resolve!

Spike Test

What about a sudden influx of traffic from someone trying to get data from the API in an uncontrolled fashion, like an abusive scraper?

At this point, we want to determine how our API will respond if it is immediately hit by this stress.

Create the stress-test.js file and copy this code to it:

import http from 'k6/http';
import { sleep } from 'k6';
import * as config from './config.js';

export const options = {
    stages: [        
        { duration: '5s', target: 200 },
        { duration: '1m', target: 200 },
        { duration: '5s', target: 0 },
    ],
    thresholds: {
        http_req_duration: ['p(95)<600'],
    },
};

export default () => {
    http.get(config.API_QUERY_URL);
    sleep(1);
};

What happens when we run the test?

Another failure from the API. It is unable to meet the set threshold. In fact, we have some warnings from timeouts.

Soak Test

The last test is what we run over a long period of time. I will provide the code for the test so that you can run it in your own free time.

import http from 'k6/http';
import { sleep } from 'k6';
import * as config from './config.js';

export const options = {
    stages: [        
        { duration: '10m', target: 20},
        { duration: '1h', target: 20},
        { duration: '5m', target: 5 },
        { duration: '1m', target: 0 },
    ],
    thresholds: {
        http_req_duration: ['p(95)<600'],
    },
};

export default () => {
    http.get(config.API_QUERY_URL);
    sleep(1);
};

We are ramping up to 20 users over 10 minutes, then we stay there for an hour. After this we ramp down to 5 users over 5 minutes, then down to 0 over 1 minute.

Conclusion

We now have an understanding that our API at its current state will be unable to meet our needs in case of sudden stress or spikes. We cannot be sure that this will ever happen but in case it does we know that we will have to figure out how to meet that demand.

These tests should not be performed in local environments like our case. You want to run this on a production machine with a given workload that would represent ideal scenarios.

I will be tuning the API later and maybe follow this up with a separate post on that experience!

The last few months of Covid have been eye-opening for parents; especially parents that only have children in the house during the holiday. We sympathize with you, but we also think that it is a great opportunity to rediscover family and guidance. For those who are yet to catch the drift that was a teacher appreciation quick fire introduction. Teachers rock and children are a mess, but that is a good mess.

However, my training has nothing to do with children today. Today is all about Corporate training sessions. Like many other companies around the globe, Reelanalytics has seen a remarkable shift to remote working. Most of my time has now become mostly ad-hoc training sessions for multiple teams and clients at a time. Sometimes an entire day would be about others and not about me and code.

Training people has been an important part of my new remote life. I would like to spend sometime in this post to convince you that it is the most remarkable skill you can take up, if you can run through the following short points.

Teaching is not about you; but it is also about you. I have found that teaching allows you to pass on knowledge to others, and that is very important in this age. I have also learnt that during the process of teaching you also learn from others. You may be an expert at a specific subject matter but what you will learn from others is that there are other ways to look at things that you would never think about. You also figure out that you do not know everything, and this has been a very humbling experience for me.

Teaching also allows you the chance to appreciate diversity and people. You get the chance to meet different people and pick up small and interesting cultural nuances that helps you build empathy for people. The world was not painted in one color. Once you master diversity you learn to share more willingly because you understand that this world does not belong to you only.

The software development world is full of engineers who build a lot of products that fulfill their needs and fail to address the needs of others. Training has shown me first hand that you will sometimes build a feature to be used in a specific way only for a new user to surprise you with a new process. Sometimes this discovery is always so impressing, especially when you discover that your users have come up with a quirky solution to a challenge you didn’t even know you had.

Training also allows you to learn to take feedback; especially when you push a feature that proves to be unpopular with some of your users. Woe unto you if that happens to be majority of your users. It is easy for us to always think we are the brightest bulbs in the room. There are places where Engineers are considered unquestionable rock stars! Sometimes we also build shit products. When you do so prepare to take some tough feedback; if this managed to pass through your QC process.

The value of patience. Sometimes you build features that take people too long to understand. Corporate software can be complicated! That statement is a problem for a thesis. If you build a product that is a bit complex please understand that it will take a few more sessions before your team and client understands how to go about it. If you cannot make it any simpler then you should be able to sit in through numerous sessions showing people how to get about it.

Be grateful. There is no worse feeling than calling a training session to have nobody show up. So if you call people for a training program and people come in droves please take some time to thank them for coming to your meeting/session. Even if just one person shows up. Now that you understand how difficult setting aside time for anything, including training is, you should be able to discern that the people who have come for your training deserve your gratitude. Thank them for the opportunity to teach them and for learning from them too.

Having learnt this I am encouraged to seek every new opportunity to set up training sessions for my teams and our clients. I would encourage you to start as well.