Thursday, July 21, 2005

An interesting signal delivery related problem

Recently, we found an interesting performance problem using Dtrace. The program was when using Virtual timer created using setitimer(2). The interval passed was 10m (one clock tick) but SIGVTALRM signal used to arrive late and sometimes 6 ticks or more. Now how will you Dtrace the code and from where will you start tracing? I'll start tracing from signal generation to delivery. In Solaris kernel to post a signal we use sigtoproc() and eat_signal() is  called on the thread to make the thread on proc (TS_ONPROC) depending upon the state (TS_RUN, TS_SLEEP, TS_STOPPED). psig() is called we kernel finds a pending signal (for instance when returning from trap).

The program spins in userland after setting up the timer. Since the state of thread would be TS_ONPROC, it would be required to poke the target CPU if thread happens to be running on different CPU. So I started tracing following functions: sigtoproc(), eat_signal(), poke_cpu() and psig(). Now lets take a look at the Dtrace probes output:


CPU Probe ID Function
8 11263 eat_signal:entry 1027637980027920 sig : 28
8 2981 poke_cpu:entry 1027637980030560 cpu : 9
8 11263 eat_signal:entry 1027637990025440 sig : 28
8 2981 poke_cpu:entry 1027637990032160 cpu : 9
8 11263 eat_signal:entry 1027638000036320 sig : 28
8 2981 poke_cpu:entry 1027638000043600 cpu : 9
8 11263 eat_signal:entry 1027638010025520 sig : 28
8 2981 poke_cpu:entry 1027638010032240 cpu : 9
8 11263 eat_signal:entry 1027638020023840 sig : 28
8 2981 poke_cpu:entry 1027638020031280 cpu : 9
8 11263 eat_signal:entry 1027638030028720 sig : 28
8 2981 poke_cpu:entry 1027638030035920 cpu : 9
8 11263 eat_signal:entry 1027638040024480 sig : 28
[.]
9 8317 psig:entry 1027638170086480 sig : 28

If you calculate the difference (ie timestamp) between psig() and the first eat_signal(), you will notice that the difference is huge.

1027638170086480-1027637980027920
190058560 = 19 ticks (190 ms)


We also noticed that CPU 8 (from where sigtoproc() is being called by clock_tick()) is poking CPU 9, however CPU 9 is not preempting the current running thread (program which is spinning). So why and how will it happen? In order to understand this, I'll first describe a bit on how preemption works in Solaris. In order to preempt a running thread, kernel sets t_astflag (using aston() macro) and also sets appropriate CPU preemption flag. There are two CPU preemption flags viz: cpu_runrun for user level preemptions and cpu_kprunrun for kernel level preemptions. RT threads can preempt TS or SYS or IA class threads since kernel level preemptions typically kicks off when current running threads priority is <= 100 (KPQPRI). For signal we don't set CPU level preemption flags. We just need to set t_sig_check and t_astflag followed by poke call.

Since we are interested in user level preemption, we should know what happens when CPU 8 poked CPU 9 (using cross call). If the current running thread on CPU 9 is in userland, then we call user_rtt() which calls trap() if the checks for t_astflag succeeds. So lets check whether t_astflag would be set when we call eat_signal() or not. And that's where the problem was. If the target thread in eat_signal() is TS_ONPROC, we should set t_astflag and then poke the CPU. It will be clear from the following probe that the running thread on CPU 9
was getting preempted because the time quantum finished and clock would have set t_astflag in cpu_surrender().

9 15055 post_syscall:entry 1027637970269440
8 11263 eat_signal:entry 1027637980027920 sig : 28
8 2981 poke_cpu:entry 1027637980030560 cpu : 9
[.]
8 11263 eat_signal:entry 1027638040024480 sig : 28
8 2981 poke_cpu:entry 1027638040026800 cpu : 9
[.]
8 11263 eat_signal:entry 1027638110024160 sig : 28
8 2981 poke_cpu:entry 1027638110026560 cpu : 9
8 2435 cpu_surrender:entry 1027638170024720 t:3001b7af3e0
8 2981 poke_cpu:entry 1027638170027280 cpu : 9
8 11263 eat_signal:entry 1027638170032720 sig : 28
9 2919 poke_cpu_intr:entry 1027638170033760
8 2981 poke_cpu:entry 1027638170034400 cpu : 9
9 3390 trap:entry 1027638170037840 type :512, pc: 10984, ast:1
8 2981 poke_cpu:entry 1027638170038640 cpu : 9
9 2919 poke_cpu_intr:entry 1027638170045680
9 1497 trap_cleanup:entry 1027638170054880 0
9 8317 psig:entry 1027638170086480 sig : 28
9 2278 trap_rtt:entry 1027638170117440
9 15055 post_syscall:entry 1027638170143360
9 8317 psig:entry 1027638170150880 sig : 2

So Dtrace did help us in finding out where the problem could be. This is just once example. Happy Dtracing...

No comments: