1. What's a dispatcher lock
Dispatcher lock is a one byte lock (disp_lock_t) which is acquired at high pil (DISP_LEVEL) and DISP_LEVEL is the interrupt level at which dispatcher operations should be performed. There are other symbolic interrupt levels viz. CLOCK_LEVEL and LOCK_LEVEL in machlock.hFollowing are the interfaces for dispatcher lock which are described in disp_lock.c
disp_lock_init() initializes dispatcher lock.
disp_lock_destroy() destroys dispatcher lock.
disp_lock_enter() acquires dispatcher lock.
disp_lock_exit() releases dispatcher lock and checks for kernel preemption.
disp_lock_exit_nopreempt() releases dispatcher lock without checking for kernel preemption.
disp_lock_enter_high() acquires another dispatcher lock when the thread is already holding a dispatcher lock.
disp_lock_exit_high() releases the top level dispatcher lock.
Here are the facts about dispatcher locks :-
(a) Being a spin lock which are acquired at high level, dispatcher locks should be acquired for a short duration and shouldn't make blocking calls.
(b) While releasing dispatcher lock, you can be preempted if cpu_kprunrun (kernel preemption) is set. You can use disp_lock_exit_nopreempt() if you don't want to be preempted.
(c) While holding dispatcher lock, you are not preemptible.
(d) Since dispatcher lock raises pil to DISP_LEVEL, the old pil is saved in t_oldspl of the thread structure (kthread_t)
2. What's a thread lock
Thread lock is a per-thread entity which protects t_state and state-related flags of a kernel thread. Thread lock hangs off kthread_t as t_lockp. t_lockp is a pointer to thread dispatcher lock and the pointer is changed whenever the state of the kernel thread is changed. One would acquire thread lock using thread_lock() routine giving the kernel thread pointer. thread_lock() is responsible for getting the correct dispatcher lock for the thread. The dance done by thread_lock() is interesting because t_lockp is pointer and can get changed during the course of spinning for a dispatcher lock. Hence thread_lock() saves t_lockp pointer and ensures that we acquire the right thread lock.
Now lets take a look at the interfaces in Solaris kernel which are described in disp_lock.c and thread.h
thread_lock() is called to require thread lock.
thread_unlock() is called to release thread lock and it checks for kernel preemption.
thread_lock_high() is called to acquire another thread lock while holding one.
thread_unlock_high() is called to release thread lock while holding one.
thread_unlock_nopreempt() is called to release thread lock without checking for kernel preemption.
3. Various types of thread locks in Solaris Kernel
Now that I've described about thread lock, it's very important for us to understand what dispatcher locks are acquired depending upon the state of the thread. In order to find out this, you need to first understand the one-to-one mapping between the state of the thread and it's corresponding dispatcher lock:
TS_RUN (runnable) ---> disp_lock of the dispatch queue in a CPU (cpu_t) or global preemption queue of a CPU partition
TS_ONPROC (running ) ---> cpu_thread_lock in a CPU (cpu_t)
TS_SLEEP (sleep) ---> sleepq bucket lock or turnstile chain lock
TS_STOPPED (stopped) ---> stop_lock (a global dispatcher lock) for stopped threads.
There're two global dispatcher locks: shuttle_lock and transition_lock in Solaris Kernel. When thread lock of a thread is pointing to shuttle_lock, it means that the thread is sleeping on a door and when thread lock points to transition_lock, it means that thread is in transition to another state (for instance when the state of the thread sleeping on a semaphore is changed from TS_SLEEP to TS_RUN or during yield()). transition_lock is always held and is never released.
4. Examples of thread lock
Now lets understand what all thread locks will be involved from wakeup (or unsleep) to onproc (running) of a thread. Lets assume that T1 (thread 1) is blocked on a condition variable CV1 and T2 (thread 2) signals T1 as part of wakeup. First cv_signal() grabs sleepq bucket lock and decrements the waiters count on CV1. It then calls sleepq_wakeone_chan() to wakeup T1. sleepq_wakeone_chan()'s responsibility is to unlink T1 from the sleepq list (using t_link of kthread_t) and calls CL_WAKEUP (scheduling class specific wakeup routine). Assuming T1 is in time sharing class (TS), ts_wakeup() gets called. Now ts_wakeup() which in turn calls dispatcher enqueue routine (setfrontdq() or setbackdq()) changes the state of T1 thread to TS_RUN and changes t_lockp to point to disp_lock of the chosen CPU. At last sleepq_wakeone_chan() drops disp_lock of the dispatch queue and finally sleepq dispatcher lock is also released in cv_signal(). Once T1 is chosen to run, disp() removes T1 from the dispatch queue of the CPU and changes the state to TS_ONPROC and t_lockp to cpu_thread_lock of the CPU.void
cv_signal(kcondvar_t *cvp)
{
condvar_impl_t *cp = (condvar_impl_t *)cvp;
/* make sure the cv_waiters field looks sane */
ASSERT(cp->cv_waiters <= CV_MAX_WAITERS);
if (cp->cv_waiters > 0) {
sleepq_head_t *sqh = SQHASH(cp);
disp_lock_enter(&sqh->sq_lock);
ASSERT(CPU_ON_INTR(CPU) == 0);
if (cp->cv_waiters & CV_WAITERS_MASK) {
kthread_t *t;
cp->cv_waiters--;
t = sleepq_wakeone_chan(&sqh->sq_queue, cp);
/*
* If cv_waiters is non-zero (and less than
* CV_MAX_WAITERS) there should be a thread
* in the queue.
*/
ASSERT(t != NULL);
} else if (sleepq_wakeone_chan(&sqh->sq_queue, cp) == NULL) {
cp->cv_waiters = 0;
}
disp_lock_exit(&sqh->sq_lock);
}
}
The second example is from the phase of preemption. We know that there are two types of preemption in Solaris kernel viz. user preemption (cpu_runrun) and kernel preemption (cpu_kprunrun). Assume that T1 is being preempted in favour of a high priority thread. As a result T1 will call preempt() once T1 realizes that it has to give up the CPU (there're hooks in Solaris kernel to determine this). preempt() first grabs thread lock effectively cpu_thread_lock on itself and calls THREAD_TRANSITION() to change the t_lockp to transition_lock. Note that the state of T1 is still TS_ONPROC while t_lockp is pointing to transition_lock, because T1 is in transition phase (from TS_ONPROC -> TS_RUN). THREAD_TRANSITION() also releases previous dispatcher lock because transition_lock is always held. preempt() then calls CL_PREEMPT(), scheduling class specific preemption routine, to enqueue T1 on a particular CPU. From here on it's same as described in the first example.
void
preempt()
{
kthread_t *t = curthread;
klwp_t *lwp = ttolwp(curthread);
if (panicstr)
return;
TRACE_0(TR_FAC_DISP, TR_PREEMPT_START, "preempt_start");
thread_lock(t);
if (t->t_state != TS_ONPROC || t->t_disp_queue != CPU->cpu_disp) {
/*
* this thread has already been chosen to be run on
* another CPU. Clear kprunrun on this CPU since we're
* already headed for swtch().
*/
CPU->cpu_kprunrun = 0;
thread_unlock_nopreempt(t);
TRACE_0(TR_FAC_DISP, TR_PREEMPT_END, "preempt_end");
} else {
if (lwp != NULL)
lwp->lwp_ru.nivcsw++;
CPU_STATS_ADDQ(CPU, sys, inv_swtch, 1);
THREAD_TRANSITION(t);
CL_PREEMPT(t);
DTRACE_SCHED(preempt);
thread_unlock_nopreempt(t);
TRACE_0(TR_FAC_DISP, TR_PREEMPT_END, "preempt_end");
swtch(); /* clears CPU->cpu_runrun via disp() */
}
}
5. An example of a dispatcher lock and Bug 5017148.
Apart from illustrating dispatcher lock, I'll also describe a problem which I had found a while back. This's involves kernel door implementation too.
I usually begin with looking at what CPUs are doing whenever I take a look at a crash dump from a system hang:
> ::cpuinfo
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
0 0001041d2b0 1b 1 0 60 no no t-0 3001ba04900 cluster
1 30019fe4030 1d 2 0 101 no no t-0 3003d873a40 rgmd
2 3001a38aab8 1d 1 0 165 yes yes t-0 2a1003ebd20 sched
3 0001041b778 1d 2 0 60 yes yes t-0 3004fac3c80 cluster
> 0x30001d7cae0$
3004fac3c80
>
Lets disassemble cv_block() thread 3004fac3c80 is stuck
cv_block+0x9c: add %i2, 8, %i0
cv_block+0xa0: call -0x460e0
cv_block+0xa4: mov %i0, %o0
> 0x3004fac3c80::print kthread_t t_lockp
t_lockp = cpu0+0xb8
> cpu0=J
1041b778 // CPU 3
> 0x3004fac3c80::print kthread_t ! grep wchan
lc_wchan = 0x3006fc52d20
> 0x10471d88::print sleepq_head_t
{
sq_queue = {
sq_first = 0x3001b476ee0
}
sq_lock = 0xff <----- dispatcher lock is held
}
> 3003d873a40::findstack
stack pointer for thread 3003d873a40: 2a1025964a1
[ 000002a1025964a1 panic_idle+0x1c() ]
000002a102596551 prom_rtt()
000002a1025966a1 thread_lock_high+0xc()
000002a102596751 sema_p+0x60()
000002a102596801 kobj_open+0x84()
000002a1025968d1 kobj_open_file+0x44()
[.]
000002a102597011 xdoor_proxy+0x20c()
000002a1025971f1 door_call+0x204()
000002a1025972f1 syscall_trap32+0xa8()
>
Since the hashing function SQHASH() would return the same index for 0x3006fc52d20 and 0x300819f3118, we see that sema_p() getting stuck on the thread lock which is held by thread running on CPU 3 and thread running on CPU 3 is stuck because sleep queue bucket lock is held by thread running on CPU 1.
> 0x3003d873a40::print kthread_t t_lockp
t_lockp = cpu0+0xb8
> cpu0+0xb8/x
cpu0+0xb8: ff00
> 0x3003d873a40::print kthread_t ! grep cpu
t_bound_cpu = 0
t_cpu = 0x30019fe4030
t_lockp = cpu0+0xb8 // CPU 3's cpu_thread_lock
t_disp_queue = cpu0+0x78
static kthread_t *
door_get_server(door_node_t *dp)
{
[.]
/*
* Mark the thread as ONPROC and take it off the list
* of available server threads. We are committed to
* resuming this thread now.
*/
disp_lock_t *tlp = server_t->t_lockp;
cpu_t *cp = CPU;
pool->dp_threads = server_t->t_door->d_servers;
server_t->t_door->d_servers = NULL;
/*
* Setting t_disp_queue prevents erroneous preemptions
* if this thread is still in execution on another processor
*/
server_t->t_disp_queue = cp->cpu_disp;
CL_ACTIVE(server_t);
/*
* We are calling thread_onproc() instead of
* THREAD_ONPROC() because compiler can reorder
* the two stores of t_state and t_lockp in
* THREAD_ONPROC().
*/
thread_onproc(server_t, cp);
disp_lock_exit(tlp);
return (server_t);
[.]
As a result server thread's t_lockp points to incorrect cpu_thread_lock because client thread started running on different CPU when client thread did shuttle_resume() to server thread. We can see that door_return() (which return the results to the caller) releases dispatcher lock without getting preempted, so we didn't notice this problem in door_return().
On the move for cracking another problem now...In fact we don't get sleep if we don't take a look at the crash dump :-)
Technorati Tag: OpenSolaris
Technorati Tag: Solaris