Saurabh Mishra (Technical Blog): July 2005

Recently, we found an interesting performance problem using Dtrace. The program was when using Virtual timer created using setitimer(2). The interval passed was 10m (one clock tick) but SIGVTALRM signal used to arrive late and sometimes 6 ticks or more. Now how will you Dtrace the code and from where will you start tracing? I'll start tracing from signal generation to delivery. In Solaris kernel to post a signal we use sigtoproc() and eat_signal() is  called on the thread to make the thread on proc (TS_ONPROC) depending upon the state (TS_RUN, TS_SLEEP, TS_STOPPED). psig() is called we kernel finds a pending signal (for instance when returning from trap).

The program spins in userland after setting up the timer. Since the state of thread would be TS_ONPROC, it would be required to poke the  target CPU if thread happens to be running on different CPU. So I started tracing following functions: sigtoproc(), eat_signal(), poke_cpu() and psig(). Now lets take a look at the Dtrace probes output:


CPU Probe ID              Function
 8  11263                 eat_signal:entry  1027637980027920 sig : 28
 8   2981                   poke_cpu:entry  1027637980030560 cpu : 9
 8  11263                 eat_signal:entry  1027637990025440 sig : 28
 8   2981                   poke_cpu:entry  1027637990032160 cpu : 9
 8  11263                 eat_signal:entry  1027638000036320 sig : 28
 8   2981                   poke_cpu:entry  1027638000043600 cpu : 9
 8  11263                 eat_signal:entry  1027638010025520 sig : 28
 8   2981                   poke_cpu:entry  1027638010032240 cpu : 9
 8  11263                 eat_signal:entry  1027638020023840 sig : 28
 8   2981                   poke_cpu:entry  1027638020031280 cpu : 9
 8  11263                 eat_signal:entry  1027638030028720 sig : 28
 8   2981                   poke_cpu:entry  1027638030035920 cpu : 9
 8  11263                 eat_signal:entry  1027638040024480 sig : 28
[.]
 9   8317                       psig:entry  1027638170086480 sig : 28

If you calculate the difference (ie timestamp) between psig() and the first eat_signal(), you will notice that the difference is huge.

1027638170086480-1027637980027920
190058560 = 19 ticks (190 ms)


We also noticed that CPU 8 (from where sigtoproc() is being called by clock_tick()) is poking CPU 9, however CPU 9 is not preempting the current running thread (program which is spinning). So why and how will it happen? In order to understand this, I'll first describe a bit on how preemption works in Solaris. In order to preempt a running thread, kernel sets t_astflag (using aston() macro) and also sets appropriate CPU preemption flag. There are two CPU preemption flags viz: cpu_runrun for user level preemptions and cpu_kprunrun for kernel level preemptions. RT threads can preempt TS or SYS or IA class threads since kernel level preemptions typically kicks off when current  running threads priority is <= 100 (KPQPRI). For signal we don't set CPU level preemption flags. We just need to set t_sig_check and t_astflag followed by poke call.

Since we are interested in user level preemption, we should know what happens when CPU 8 poked CPU 9 (using cross call). If the current running thread on CPU 9 is in userland, then we call user_rtt() which calls trap() if the checks for t_astflag succeeds. So lets check whether t_astflag would be set when we call eat_signal() or not. And that's where the problem was. If the target thread in eat_signal() is TS_ONPROC, we should set t_astflag and then poke the CPU. It will be clear from the following probe that the running thread on CPU 9
was getting preempted because the time quantum finished and clock would have set t_astflag in cpu_surrender().

 9  15055               post_syscall:entry  1027637970269440
 8  11263                 eat_signal:entry  1027637980027920 sig : 28
 8   2981                   poke_cpu:entry  1027637980030560 cpu : 9
[.]
 8  11263                 eat_signal:entry  1027638040024480 sig : 28
 8   2981                   poke_cpu:entry  1027638040026800 cpu : 9
[.]
 8  11263                 eat_signal:entry  1027638110024160 sig : 28
 8   2981                   poke_cpu:entry  1027638110026560 cpu : 9
 8   2435              cpu_surrender:entry  1027638170024720 t:3001b7af3e0
 8   2981                   poke_cpu:entry  1027638170027280 cpu : 9
 8  11263                 eat_signal:entry  1027638170032720 sig : 28
 9   2919              poke_cpu_intr:entry  1027638170033760
 8   2981                   poke_cpu:entry  1027638170034400 cpu : 9
 9   3390                       trap:entry  1027638170037840 type :512, pc: 10984, ast:1
 8   2981                   poke_cpu:entry  1027638170038640 cpu : 9
 9   2919              poke_cpu_intr:entry  1027638170045680
 9   1497               trap_cleanup:entry  1027638170054880 0
 9   8317                       psig:entry  1027638170086480 sig : 28
 9   2278                   trap_rtt:entry  1027638170117440
 9  15055               post_syscall:entry  1027638170143360
 9   8317                       psig:entry  1027638170150880 sig : 2

So Dtrace did help us in finding out where the problem could be. This is just once example. Happy Dtracing...

Sometime back I had a problem with my desktop and as a result it started crawling whenever Java ticker used to kick in. I think I must share this with the rest of the world. I'd also share a kernel problem that we cracked and it was related to performance. So Dtrace has helped in solving many problems so far.

My desktop running Solaris 10 started crawling when I noticed that Xsun is eating up 68% of CPU. From prstat(1M)

# prstat
  PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  594 ******     85M   78M run     30    0  14:03:19  68% Xsun/1
  796 root       16M   13M sleep   59    0   1:06:25 5.8% stfontserverd/18
[.]

I then started Dtrac'ing Xsun and noticed that lwp_sigmask() syscall call is being made too frequently by Xsun. Here is the data :-

# ./syscall.d
^C
Ran for 26 seconds


 writev                                                         2832
 pollsys                                                        3261
 read                                                           5910
 doorfs                                                        27199
 lwp_sigmask                                                  217592

LWP ID     COUNT
1          217592

             libc.so.1`__systemcall6+0x20
             libc.so.1`pthread_sigmask+0x1b4
             libc.so.1`sigprocmask+0x20
             libc.so.1`sighold+0x54
             libST.so.1`fsexchange+0x78
             libST.so.1`FSSessionDisposeFontInstance+0x8c
            9063

             libc.so.1`__systemcall6+0x20
             libc.so.1`pthread_sigmask+0x1b4
             libc.so.1`sigprocmask+0x20
             libc.so.1`sigrelse+0x54
             libST.so.1`fsexchange+0xc0
             libST.so.1`FSSessionGetFontRenderingParams+0x8c

...and many more such stack traces from libST.so.1`fsexchange().

Infact the stack is like this:-

             libc.so.1`__systemcall6+0x20
             libc.so.1`pthread_sigmask+0x1b4
             libc.so.1`sigprocmask+0x20
             libc.so.1`sighold+0x54
             libST.so.1`fsexchange+0x90
             libST.so.1`FSSessionGetFontRenderingParams+0x8c
             libST.so.1`GetRenderProps+0x344
             libST.so.1`GlyphVectorRepQuery+0xf4
             libST.so.1`STGlyphVectorQuery+0xd0
             SUNWXst.so.1`_XSTUseCache+0x68

Notice that in this stack trace, we are calling sighold() and sigrelse() too frequently. So this process is disabling and enabling signals for some reason. Looks like we are rendering characters, but why do we block and unblock signals in this path?. Here is the Dtrace script which was used :-

#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN
{
       start = timestamp;

}

syscall:::entry
/execname == "Xsun"/
{
       @s[probefunc] = count();
}

syscall::lwp_sigmask:entry
/execname == "Xsun"/
{
       @c[curthread->t_tid] = count();
       @st[ustack(6)] = count();
}

END
{
       printf("Ran for %d seconds\n\n", (timestamp - start) / 1000000000);

       trunc(@s,5);
       printa(@s);

       printf("\n%-10s %-10s\n", "LWP ID", "COUNT");
       printa("%-10d %@d\n", @c);

       printa(@st);
}

In fact Dtrace could help us in solving much more complex problems. Happy Dtrac'ing...

Saurabh Mishra (Technical Blog)

Thursday, July 21, 2005

An interesting signal delivery related problem

Tuesday, July 19, 2005

Dtrace rocks...

Blog Archive

About Me

Stat Counter