Tuesday, July 19, 2005

Dtrace rocks...

Sometime back I had a problem with my desktop and as a result it started crawling whenever Java ticker used to kick in. I think I must share this with the rest of the world. I'd also share a kernel problem that we cracked and it was related to performance. So Dtrace has helped in solving many problems so far.

My desktop running Solaris 10 started crawling when I noticed that Xsun is eating up 68% of CPU. From prstat(1M)

# prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
594 ****** 85M 78M run 30 0 14:03:19 68% Xsun/1
796 root 16M 13M sleep 59 0 1:06:25 5.8% stfontserverd/18
[.]

I then started Dtrac'ing Xsun and noticed that lwp_sigmask() syscall call is being made too frequently by Xsun. Here is the data :-

# ./syscall.d
^C
Ran for 26 seconds


writev 2832
pollsys 3261
read 5910
doorfs 27199
lwp_sigmask 217592

LWP ID COUNT
1 217592

libc.so.1`__systemcall6+0x20
libc.so.1`pthread_sigmask+0x1b4
libc.so.1`sigprocmask+0x20
libc.so.1`sighold+0x54
libST.so.1`fsexchange+0x78
libST.so.1`FSSessionDisposeFontInstance+0x8c
9063

libc.so.1`__systemcall6+0x20
libc.so.1`pthread_sigmask+0x1b4
libc.so.1`sigprocmask+0x20
libc.so.1`sigrelse+0x54
libST.so.1`fsexchange+0xc0
libST.so.1`FSSessionGetFontRenderingParams+0x8c

...and many more such stack traces from libST.so.1`fsexchange().

Infact the stack is like this:-

libc.so.1`__systemcall6+0x20
libc.so.1`pthread_sigmask+0x1b4
libc.so.1`sigprocmask+0x20
libc.so.1`sighold+0x54
libST.so.1`fsexchange+0x90
libST.so.1`FSSessionGetFontRenderingParams+0x8c
libST.so.1`GetRenderProps+0x344
libST.so.1`GlyphVectorRepQuery+0xf4
libST.so.1`STGlyphVectorQuery+0xd0
SUNWXst.so.1`_XSTUseCache+0x68

Notice that in this stack trace, we are calling sighold() and sigrelse() too frequently. So this process is disabling and enabling signals for some reason. Looks like we are rendering characters, but why do we block and unblock signals in this path?. Here is the Dtrace script which was used :-

#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN
{
start = timestamp;

}

syscall:::entry
/execname == "Xsun"/
{
@s[probefunc] = count();
}

syscall::lwp_sigmask:entry
/execname == "Xsun"/
{
@c[curthread->t_tid] = count();
@st[ustack(6)] = count();
}

END
{
printf("Ran for %d seconds\n\n", (timestamp - start) / 1000000000);

trunc(@s,5);
printa(@s);

printf("\n%-10s %-10s\n", "LWP ID", "COUNT");
printa("%-10d %@d\n", @c);

printa(@st);
}

In fact Dtrace could help us in solving much more complex problems. Happy Dtrac'ing...

No comments: