We have recently added a new feature in Gravity that allows us to see the coverage of an E2E test suite compared to the user’s real sessions.
When talking about test coverage, most testers and developers will think about code coverage (or eventually requirements coverage). Let’s see how code coverage works and how different it is from user sessions coverage.
Testing User Sessions: Understanding the Importance of Coverage
Code coverage
Usually, testers and developers compute test coverage over lines of code, making it a pretty straightforward process (at least, basic coverage is easy). However, we’ll just consider this to avoid delving too deeply in this direction. If we consider the following code:
function displayIfOdd(n: number) {
if (n % 2 === 0) {
console.log(`${n} is even`)
} else {
console.log(`${n} is odd`)
}
}
There are three instructions in this code:
- Testing
n % 2 === 0
- Displaying “number is even”
- Displaying “number is odd”
Now, if we have a single test calling this function with an odd number, 66% of the code will be covered. And if we have two tests (one using an odd number and another using an even number), we will cover 100% of the code.
In Gravity, we do not work with lines of code. Instead, we compare sessions coming from user interactions with an application, or E2E tests traces. Let’s take a look at the difference between those items and how it will impact coverage computing.
Lines of code vs user interactions
In Gravity, we express the sessions and the tests as being a list of user interactions. Those interactions can be of three types:
- user action, for example, clicking on an element on the web page, filling an input, selecting an option, etc
- transition: the user opens or leaves a page
- request: a request made to the backend
There are three main differences between those actions and the lines of code:
- the importance of the order
- the validity of each item
- the variability of activation
Order
In a block of code, the instructions will trigger in the same order. If we take a look at the code sample at the beginning of this article, it will always produce the output after checking if the number is even.
In a user session, this is not the case. When filling out a login form, the classic flow of events is:
- fill the “username” field
- fill the “password” field
- click on the “login” button
In practice, we could swap the first and second steps and the result would be the same.
Now let’s consider a test where the login form is filled classically and a session where the user fills the password field first, should the session be considered as covered or not?
Noise
In a block of code, every instruction has a meaning (we’ll consider there is no dead code for the sake of the argument). This means that each instruction should be taken into account when computing the coverage.
In a user session, there are many events that are meaningless. For example, when I am reading an article online, I tend to double-click and select a word in the paragraph I am reading. Sometimes, I also copy the selected word to look up its meaning or translation. Those events appear in my session as user actions, but they do not have any real meaning about what I did on the website (read an article).
Even the tests can also have some noise in the actions. In our Gravity E2E tests, one of the first steps is to click on the “Accept cookie” in the banner. Why? Depending on the screen resolution, the “Login” button is hidden behind the banner and the test fails as the user is not able to connect (where a real user would simply scroll past the banner).
Does this mean that a session where the user did not click on “Accept cookies” is not covered?
The variability of activation (for lack of a better wording)
In the code, an instruction activation is boolean: the instruction has been triggered by a test, or it has not. It’s as simple as that.
On a website, many different activities are similar but not identical. Let’s take the example of a webshop: in the test, the user opens the page displaying item #42 and adds it to the basket. Now, a user opens the page displaying item #12 and adds it to the basket.
The activities are different (they are not executed on the same page) but they have the same business meaning. Should we consider that the session is covered by the test or not?
Now, let’s consider that the page displaying an item has two buttons to add it to the basket (one on the top of the page and one after the item description). If the test clicks on the top button and the user clicks on the bottom one, is that the same session?
We now have a lot of questions about what coverage means. Let’s see how we decided to answer the question with Gravity: using business activities.
Computing the sessions coverage
In Gravity, we have an artifact called Activities. Those express a list of consecutive user interactions as a business activity. For a web shop, the list of activities would be something like this:
- register
- login
- search
- view item
- add item to cart
- checkout
- sign out
Each of these activities will contain one or more user interactions (for example, the “View item” activity can be expressed as “visiting the page /items/<any>”, the “search activity” could be expressed as “filling the search field and pressing the Enter key”).
With those activities, we can express a session as being a succession of activities (or a Flow, as we call it).
Expressing sessions and tests as flows
Let’s consider we have three activities in our Gravity domain:
- Activity A: the user interactions are A then B then C
- Activity B: the user actions are C then D then E
- Activity C: the user action is F
Let’s consider a session realizing the following user actions: G, A, B, D, C, F. This session realizes two of the activities: Activity A and Activity C.
With the session expressed as a Flow, we can get rid of the user interactions that are not pertinent from a business point of view. We can also add some flexibility in the activities, like saying that clicking on the “Add to backet” button is the same activity, regardless of which product you are adding to the basket.
This also means that if the domain does not define any activities, Gravity will be unable to translate the user sessions as flows. This is why there is a build-in Activity discovery tool in Gravity, which helps you identify what are the activities which are the most often performed by the user.
Now that we can express the sessions (and the tests) as flows, how do we use that to compute the coverage of a session by a test?
Comparing flows to compute the coverage
Let’s consider the following test and session flows on a webshop :
- Session A flow is: login – search – view item – add to cart
- Session B flow is: view item – add to cart – login – checkout
- Session C flow is: login – view item – add to cart – view item – add to cart – checkout
- Test flow is: login – view item – add to cart – checkout
We compute the coverage of each session by determining the number of activities in the session that match the order of activities in the test. We then do a ratio with the total number of activities realized by the session.
In our case, this means:
- Session A covers 75% – the test excludes the “search” activity but executes the other three activities in the same order. Thus, three out of four activities are covered.
- Session B covers 25% – only the test covers the “login” activity. It was meant to be followed by “view item”, but it does not appear after “login”. While both user sessions may seem equivalent, we need a test to ensure that the basket retains the items selected by the user after login.
- Session C covers 66% – the first four activities match the test, but we expected the user to perform the “checkout activity,” not add another item. Hence, the second “view item – add to cart” activities are disregarded.
If we compute the average of those coverages, we can say that the test covers 55% of the sessions.
Conclusion
This is our first try at expressing test coverage for user sessions. It gives a pretty good indicator of how the test suite matches what the users really do. That being said, there are still lots of questions that still have to be answered: shall we ignore some activities? How should we handle the repetition of activities?