How to reproduce autotest fails: Difference between revisions

Latest revision as of 03:16, 28 March 2023

The current CI system suffers from high CPU load and since the virtual machines are using shared resources, on high load sometimes something glitches and the autotests may crash. Here are a few pointers on how to try to reproduce such issues on different platforms.

As a rule of thumb you need to put load on your system, reduce niceness and memory, and fix processor affinity.

If you don't want to jeopardize your own machine, you can ask someone from the CI team to clone a VM for you. Anyone from the release team can help you to create a VM. Just ask in #qt-qa or #qt-labs in the IRC. You can also use minicoin to bring up a usually-suitable VM for the platforms it supports. Also note, that if you create a new bug report for the failing autotest, remember to add labels: 'autotest' and 'flaky' to the label field, so that it gets tracked properly.

When diagnosing a crashing test, passing the command line option "-nocrashhandler" to the test suppresses dumping of the stack trace by QTestlib and makes it possible to attach a debugger or have it launched automatically by the OS (post-mortem).

Linux

1. 'stress' or 'stress-ng' imposes certain types of compute stress on your system

Install

Ubuntu and/or Debian:	sudo apt-get install stress
openSUSE:	sudo zypper install stress-ng

Example usage:

Ubuntu and/or Debian:	stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
openSUSE:	stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M

you can check with 'top' that it runs

See more information from: https://linux.die.net/man/1/stress https://www.cyberciti.biz/faq/stress-test-linux-unix-server-with-stress-ng/

2. Increase niceness

   nice -n 19 ./test

See: http://bencane.com/2013/09/09/setting-process-cpu-priority-with-nice-and-renice/

3. Use 'taskset' to set process affinity

   taskset -c 1

Which means: "use second core".

You can also launch a test with a set priority:

   taskset -c 1 ./tst_foo

From the 'taskset' man page: -c, --cpu-list "specify a numerical list of processors instead of a bitmask. The list may contain multiple items, separated by comma, and ranges. For example, 0,5,7,9-11." See more information: https://linux.die.net/man/1/taskset

4. Run tests in a loop over and over

Launch 'stress' (see above, 1) Run tests in a loop:

       for i in {0..100}; do taskset -c 1 ./tst_example >> log.txt 2>&1; done

You can also try to run the same tests from two different terminals and set the process affinity.

5. What if it's a segmentation fault/core dump you cannot get a crash log from:

In a different terminal window, set:

       export LD_PRELOAD=/lib/x86_64-linux-gnu/libSegFault.so
       ulimit -c unlimited

after the crash in the same terminal you have set the LD_PRELOAD:

       gdb ./testCrash ./core

and, in gdb,

bt

6. Use rr

See https://rr-project.org/ and especially chaos mode for rr https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mode.html to get threads to run with different priorities, hopefully reproducing races.

7. Limit the available memory

systemd-run --scope -p MemoryMax=500M tst_example

8. Combine several approaches

Limit memory, maximise niceness, use one core (hard to verify that these all work in conjunction, but they seem to):

systemd-run --scope -p MemoryMax=500M --user nice -n 19 taskset -c 0 ./tst_example

Mac OS

1. stress testing CPU

Repeat the word “yes” at such speed that it consumes all available processor resources. In a terminal do:

   yes > /dev/null & yes > /dev/null & yes > /dev/null & yes > /dev/null &

Check with 'top' that you see 4 'yes' processes running. Run "killall yes" to kill all instances.

See: http://osxdaily.com/2012/10/02/stress-test-mac-cpu/

Windows

1. stress testing CPU:

install CPUSTRES.EXE from https://blogs.msdn.microsoft.com/vijaysk/2012/10/26/tools-to-simulate-cpu-memory-disk-load/ Activate all threads (select with tick marks) set 'Thread Priority' of the threads to be 'time critical' or 'highest' and 'Activity' to 'Busy'

2. launch test using only one thread:

    start /B /WAIT /affinity 1 test.exe

1 == use CPU 0, 2 == use CPU 1 etc. See the table from: https://blogs.msdn.microsoft.com/santhoshonline/2011/11/24/how-to-launch-a-process-with-cpu-affinity-set/

3. run the tests in a loop:

   for /L %i in (1, 1, 10); do start /B /WAIT /affinity 1 tst_example.exe >> log.txt 2>&1

(Note that the >> ensures that the output is appended to log.txt, rather than overwriting its contents after each run - see https://technet.microsoft.com/en-us/library/bb490982.aspx)

3.1. run several tests in a loop simultaneously You can also try to run the same tests from two different terminals and set the process affinity Note: When writing a .bat script, use: for /L %%i in (1, 1, 10)....

@@ Line 5: / Line 5: @@
 As a rule of thumb you need to put load on your system, reduce niceness and memory, and fix processor affinity.
-If you don't want to jeopardize your own machine, you can ask someone from the CI team to clone a VM for you. Anyone from the release team can help you to create a VM. Just ask in #qt-qa or #qt-labs in the IRC. Also note, that if you create a new bug report for the failing autotest, remember to add labels: 'autotest' and 'flaky' to the label field, so that it gets tracked properly.
+If you don't want to jeopardize your own machine, you can ask someone from the CI team to clone a VM for you. Anyone from the release team can help you to create a VM. Just ask in #qt-qa or #qt-labs in the IRC.
+You can also use [https://git.qt.io/vohilshe/minicoin minicoin] to bring up a usually-suitable VM for the platforms it supports.
+Also note, that if you create a new bug report for the failing autotest, remember to add labels: 'autotest' and 'flaky' to the label field, so that it gets tracked properly.
 When diagnosing a crashing test, passing the command line option "-nocrashhandler" to the test suppresses dumping of the stack trace by QTestlib and makes it possible to attach a debugger or have it launched automatically by the OS (post-mortem).
-== Linux ==
-'''1. 'stress' imposes certain types of compute stress on your system'''
+==Linux==
-* Install 'stress' (on Ubuntu)
+'''1. 'stress' or 'stress-ng' imposes certain types of compute stress on your system'''
-        sudo apt-get install stress
-* Example usage:
+*Install
-        stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
-* you can check with 'top' that it runs
+{| style="border-collapse:collapse" border="1"
+|'''Ubuntu and/or Debian:'''
+|<code>sudo apt-get install stress</code>
+|-
+|'''openSUSE:'''
+|<code>sudo zypper install stress-ng</code>
+|}
+*Example usage:
+{| style="border-collapse:collapse" border="1"
+|'''Ubuntu and/or Debian:'''
+|<code>stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M</code>
+|-
+|'''openSUSE:'''
+|<code>stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M</code>
+|}
+*you can check with 'top' that it runs
 See more information from:
 https://linux.die.net/man/1/stress
@@ Line 22: / Line 42: @@
-'''2. Reduce niceness'''
+'''2. Increase niceness'''
-     nice -n -20 ./test
+     nice -n 19 ./test
 See: http://bencane.com/2013/09/09/setting-process-cpu-priority-with-nice-and-renice/
@@ Line 30: / Line 50: @@
 '''3. Use 'taskset' to set process affinity'''
      taskset -c 1
 Which means: "use second core".
+You can also launch a test with a set priority:
+    taskset -c 1 ./tst_foo
 From the 'taskset' man page: -c, --cpu-list "specify a numerical list of processors instead of a bitmask. The list may contain multiple items, separated by comma, and ranges. For example, 0,5,7,9-11."
 See more information: https://linux.die.net/man/1/taskset
@@ Line 39: / Line 65: @@
 Launch 'stress' (see above, 1)
 Run tests in a loop:
          for i in {0..100}; do taskset -c 1 ./tst_example >> log.txt 2>&1; done
 You can also try to run the same tests from two different terminals and set the process affinity.
@@ Line 53: / Line 79: @@
 and, in gdb,
          bt
+'''6. Use rr'''
-== Mac OS ==
+See https://rr-project.org/ and especially chaos mode for rr https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mode.html to get threads to run with different priorities, hopefully reproducing races.
+'''7. Limit the available memory'''
+ systemd-run --scope -p MemoryMax=500M tst_example
+'''8. Combine several approaches'''
+Limit memory, maximise niceness, use one core (hard to verify that these all work in conjunction, but they seem to):
+ systemd-run --scope -p MemoryMax=500M --user nice -n 19 taskset -c 0 ./tst_example
+==Mac OS==
 '''1. stress testing CPU'''
 Repeat the word “yes” at such speed that it consumes all available processor resources. In a terminal do:
-     yes > /dev/null & yes > /dev/null & yes > /dev/null & yes > /dev/null
+     yes > /dev/null & yes > /dev/null & yes > /dev/null & yes > /dev/null &
-check with 'top' that you see 4 'yes' processes running
+Check with 'top' that you see 4 'yes' processes running. Run "killall yes" to kill all instances.
 See: http://osxdaily.com/2012/10/02/stress-test-mac-cpu/
-== Windows ==
+==Windows==
 '''1. stress testing CPU:'''