Test timeout and runtime

May 25, 2022

The new LTP release will include changes that have introduced concept of a test maximal runtime so let me briefly explain what exactly that is. To begin with let's make an observation about a LTP test duration. Most of the LTP tests do fall into two categories when duration of the test is considered. First type of tests is fast, generally under a second or two and most of the time even fraction of that. These tests mostly prepare simple environment, call a syscall or two, clean up and are done. The second type of tests runs for longer and their duration is usually counted in minutes. These tests include I/O stress test, various regression tests that are looping in order to hit a race, timer precision tests that have to sample time intervals and so on.

Historically in LTP the test duration was limited by a single value called timeout, that defaulted to a compromise of 5 minutes, which is the worst value for both classes of the tests. That is because it's clearly too long for short running tests and at the same time too short for significant fraction of the long running tests. This was clear just by checking the tests that actually adjusted the default timeout. Quite a few short running tests that were prone to deadlocks decreased the default timeout to a much shorter interval and at the same time quite a few long running tests did increase it as well.

But back at how the test duration was handled in the long running tests. The test duration for long running tests is usually bounded by a time limit as well as a limit on a number of iterations and the test exits on whichever is hit first. In order to exit the test before the timeout these tests watched the elapsed runtime and did exit the main loop if the runtime got close enough to the test timeout. The problem was that close enough was loosely defined and implemented in each test differently. That obviously leads to a different problems. For instance if test looped until there was 10 seconds left to the timeout and the test cleanup did take more than 10 seconds on a slower hardware, there was no way how to avoid triggering the timeout which resulted in test failure. If test timeout was increased the test simply run for longer duration and hit the timeout at the end either way. At the same time if the test did use proportion of the timeout left out for the test cleanup things didn't work out when the timeout was scaled down in order to shorten the test duration.

After careful analysis it became clear that the test duration has to be bound by a two distinct values. The new values are now called timeout and max_runtime and the test duration is bound by a sum of these two. The idea behind this should be clear to the reader at this point. The max_runtime limits the test active part, that is the part where the actual test loop is executed and the timeout covers the test setup and cleanup and all inaccuracies in the accounting. Each of them can be scaled separately which gives us enough flexibility to be able to scale from small embedded boards all the way up to the supercomputers. This change also allowed us to change the default test timeout to 30 seconds. And if you are asking yourself a question how max_runtime is set for short running tests the answer is simple it's set to zero since the default timeout is more than enough to cope with these.

All of this also helps to kill the misbehaving tests much faster since we have much better estimation for the expected test duration. And yes this is a big deal when you are running thousands of testcases, it may speed up the testrun quite significantly even with a few deadlocked tests.

But things does not end here, there is a bit of added complexity on the top of this. Some of the testcases will call the main test loop more than once. That is because we have a few “multipliers” flags that can increase test coverage quite a bit. For instance we have so called .all_filesystems flag, that when set, will execute the test on the top of the most commonly used filesystems. There is also flag that can run the test for a different variants, which is sometimes used to run the test for a more than one syscall variant, e.g. for clock_gettime() we run the same test for both syscall and VDSO. All these multipliers have to be taken into an account when overall test duration is computed. However we do have all these flags in the metadata file now hence we are getting really close to a state where we will have a tool that can compute an accurate upper bound for duration for a given test. However that is completely different story for a different short article.