Thursday, July 21, 2005

An interesting signal delivery related problem

Recently, we found an interesting performance problem using Dtrace. The program was when using Virtual timer created using setitimer(2). The interval passed was 10m (one clock tick) but SIGVTALRM signal used to arrive late and sometimes 6 ticks or more. Now how will you Dtrace the code and from where will you start tracing? I'll start tracing from signal generation to delivery. In Solaris kernel to post a signal we use sigtoproc() and eat_signal() is  called on the thread to make the thread on proc (TS_ONPROC) depending upon the state (TS_RUN, TS_SLEEP, TS_STOPPED). psig() is called we kernel finds a pending signal (for instance when returning from trap).

The program spins in userland after setting up the timer. Since the state of thread would be TS_ONPROC, it would be required to poke the target CPU if thread happens to be running on different CPU. So I started tracing following functions: sigtoproc(), eat_signal(), poke_cpu() and psig(). Now lets take a look at the Dtrace probes output:


CPU Probe ID Function
8 11263 eat_signal:entry 1027637980027920 sig : 28
8 2981 poke_cpu:entry 1027637980030560 cpu : 9
8 11263 eat_signal:entry 1027637990025440 sig : 28
8 2981 poke_cpu:entry 1027637990032160 cpu : 9
8 11263 eat_signal:entry 1027638000036320 sig : 28
8 2981 poke_cpu:entry 1027638000043600 cpu : 9
8 11263 eat_signal:entry 1027638010025520 sig : 28
8 2981 poke_cpu:entry 1027638010032240 cpu : 9
8 11263 eat_signal:entry 1027638020023840 sig : 28
8 2981 poke_cpu:entry 1027638020031280 cpu : 9
8 11263 eat_signal:entry 1027638030028720 sig : 28
8 2981 poke_cpu:entry 1027638030035920 cpu : 9
8 11263 eat_signal:entry 1027638040024480 sig : 28
[.]
9 8317 psig:entry 1027638170086480 sig : 28

If you calculate the difference (ie timestamp) between psig() and the first eat_signal(), you will notice that the difference is huge.

1027638170086480-1027637980027920
190058560 = 19 ticks (190 ms)


We also noticed that CPU 8 (from where sigtoproc() is being called by clock_tick()) is poking CPU 9, however CPU 9 is not preempting the current running thread (program which is spinning). So why and how will it happen? In order to understand this, I'll first describe a bit on how preemption works in Solaris. In order to preempt a running thread, kernel sets t_astflag (using aston() macro) and also sets appropriate CPU preemption flag. There are two CPU preemption flags viz: cpu_runrun for user level preemptions and cpu_kprunrun for kernel level preemptions. RT threads can preempt TS or SYS or IA class threads since kernel level preemptions typically kicks off when current running threads priority is <= 100 (KPQPRI). For signal we don't set CPU level preemption flags. We just need to set t_sig_check and t_astflag followed by poke call.

Since we are interested in user level preemption, we should know what happens when CPU 8 poked CPU 9 (using cross call). If the current running thread on CPU 9 is in userland, then we call user_rtt() which calls trap() if the checks for t_astflag succeeds. So lets check whether t_astflag would be set when we call eat_signal() or not. And that's where the problem was. If the target thread in eat_signal() is TS_ONPROC, we should set t_astflag and then poke the CPU. It will be clear from the following probe that the running thread on CPU 9
was getting preempted because the time quantum finished and clock would have set t_astflag in cpu_surrender().

9 15055 post_syscall:entry 1027637970269440
8 11263 eat_signal:entry 1027637980027920 sig : 28
8 2981 poke_cpu:entry 1027637980030560 cpu : 9
[.]
8 11263 eat_signal:entry 1027638040024480 sig : 28
8 2981 poke_cpu:entry 1027638040026800 cpu : 9
[.]
8 11263 eat_signal:entry 1027638110024160 sig : 28
8 2981 poke_cpu:entry 1027638110026560 cpu : 9
8 2435 cpu_surrender:entry 1027638170024720 t:3001b7af3e0
8 2981 poke_cpu:entry 1027638170027280 cpu : 9
8 11263 eat_signal:entry 1027638170032720 sig : 28
9 2919 poke_cpu_intr:entry 1027638170033760
8 2981 poke_cpu:entry 1027638170034400 cpu : 9
9 3390 trap:entry 1027638170037840 type :512, pc: 10984, ast:1
8 2981 poke_cpu:entry 1027638170038640 cpu : 9
9 2919 poke_cpu_intr:entry 1027638170045680
9 1497 trap_cleanup:entry 1027638170054880 0
9 8317 psig:entry 1027638170086480 sig : 28
9 2278 trap_rtt:entry 1027638170117440
9 15055 post_syscall:entry 1027638170143360
9 8317 psig:entry 1027638170150880 sig : 2

So Dtrace did help us in finding out where the problem could be. This is just once example. Happy Dtracing...

Tuesday, July 19, 2005

Dtrace rocks...

Sometime back I had a problem with my desktop and as a result it started crawling whenever Java ticker used to kick in. I think I must share this with the rest of the world. I'd also share a kernel problem that we cracked and it was related to performance. So Dtrace has helped in solving many problems so far.

My desktop running Solaris 10 started crawling when I noticed that Xsun is eating up 68% of CPU. From prstat(1M)

# prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
594 ****** 85M 78M run 30 0 14:03:19 68% Xsun/1
796 root 16M 13M sleep 59 0 1:06:25 5.8% stfontserverd/18
[.]

I then started Dtrac'ing Xsun and noticed that lwp_sigmask() syscall call is being made too frequently by Xsun. Here is the data :-

# ./syscall.d
^C
Ran for 26 seconds


writev 2832
pollsys 3261
read 5910
doorfs 27199
lwp_sigmask 217592

LWP ID COUNT
1 217592

libc.so.1`__systemcall6+0x20
libc.so.1`pthread_sigmask+0x1b4
libc.so.1`sigprocmask+0x20
libc.so.1`sighold+0x54
libST.so.1`fsexchange+0x78
libST.so.1`FSSessionDisposeFontInstance+0x8c
9063

libc.so.1`__systemcall6+0x20
libc.so.1`pthread_sigmask+0x1b4
libc.so.1`sigprocmask+0x20
libc.so.1`sigrelse+0x54
libST.so.1`fsexchange+0xc0
libST.so.1`FSSessionGetFontRenderingParams+0x8c

...and many more such stack traces from libST.so.1`fsexchange().

Infact the stack is like this:-

libc.so.1`__systemcall6+0x20
libc.so.1`pthread_sigmask+0x1b4
libc.so.1`sigprocmask+0x20
libc.so.1`sighold+0x54
libST.so.1`fsexchange+0x90
libST.so.1`FSSessionGetFontRenderingParams+0x8c
libST.so.1`GetRenderProps+0x344
libST.so.1`GlyphVectorRepQuery+0xf4
libST.so.1`STGlyphVectorQuery+0xd0
SUNWXst.so.1`_XSTUseCache+0x68

Notice that in this stack trace, we are calling sighold() and sigrelse() too frequently. So this process is disabling and enabling signals for some reason. Looks like we are rendering characters, but why do we block and unblock signals in this path?. Here is the Dtrace script which was used :-

#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN
{
start = timestamp;

}

syscall:::entry
/execname == "Xsun"/
{
@s[probefunc] = count();
}

syscall::lwp_sigmask:entry
/execname == "Xsun"/
{
@c[curthread->t_tid] = count();
@st[ustack(6)] = count();
}

END
{
printf("Ran for %d seconds\n\n", (timestamp - start) / 1000000000);

trunc(@s,5);
printa(@s);

printf("\n%-10s %-10s\n", "LWP ID", "COUNT");
printa("%-10d %@d\n", @c);

printa(@st);
}

In fact Dtrace could help us in solving much more complex problems. Happy Dtrac'ing...