Thursday, January 6, 2005

inter-process mutex

Recently, Surya Prakki (colleague of mine in Kernel Sustaining Group) and I worked on an interesting problem. We initially thought that it would be a random corruption, but later we figured out that it was an application problem. Customer also told us that his application was working fine on Solaris 9 and when they migrated to Solaris 10, they started seeing their application dumping core.

Owner field of mutex_t was getting corrupted with an invalid address which was resulting in application dumping core (SIGSEGV). This problem was reproducible easily on single CPU machine. The application was dieing in mutex_trylock_adaptive() routine when it tried to dereference the owner. The owner field of the mutex had strange data, but when you look at the core dump the owner field was zero. So this surprised us a lot and we believed that there is some race.

All sorts of thoughts came to our mind including corruption of registers when cpu switches context to another thread or some other thread overwriting the member in the mutex due to over/under run of an array.

We first started debugging this problem using procfs watchpoint. We first set a watchpoint on the virtual address of the owner field of mutex_t structure. We used mdb's :w macro for this purpose. Watchpoint used to fire frequently because application had two threads which contend for lock quite often. So we decided to use a script having "$c, $r, :c" in it. But whenever corruption happened, target process never got the corresponding watchpoint trap. So it surprised us a lot and we started wondering how this would happen.

We then started using Dtrace and truss to figure what is happening. We were trying to find out what is happening from the point mutex_unlock() clears the owner field till the process dumps core. In this process, we started ruling out the things which we thought in the beginning. We were running out of ideas now when we carefully looked at the mutex_t members and noticed that magic number is correct and type of the mutex is USYNC_THREAD. We then started using Dtrace probes when we context switch to another thread. We wanted to figure out whether context switching is playing any role here or not. During this course, we noticed that another process was getting on to the CPU after the process which dumped core released the mutex. This rang the bell in our mind. We also noticed that the mutex address (virtual address) was same when this context switch happened.

We took a look at the pmap(1) output and noticed that the mutex is from shared memory segment. The other process had used the same key (see shmget(2) system call). What it means is the mutex was used between the processes. We noticed that the corrupted value was a valid address in the other process (a thread address in fact). This surprised us again because we had seen USYNC_THREAD as the type of the mutex and we had *believed* that this mutex is being used between the threads of the same process. This disappointed us a lot. If the mutex is to be used between processes, then the type of the mutex has to be USYNC_PROCESS because one can't really dereference the owner when the mutex is being used between the processes (inter-process mutex).

From the man pages of mutex_init(3THR)

USYNC_THREAD
The mutex can synchronize threads only in this pro-
cess. arg is ignored.

USYNC_PROCESS
The mutex can synchronize threads in this process and
other processes. arg is ignored. The object initial-
ized with this attribute must be allocated in memory
shared between processes, either in System V shared
memory (see shmop(2)) or in memory mapped to a file
(see mmap(2)). If the object is not allocated in such
shared memory, it will not be shared between
processes.

We asked the submitter of the bug to make this modification and it all worked fine. Customer came back saying our diagnosis is correct and he modified the application accordingly.

Having spent a week or so, the bottom line is that don't take things for granted :)