Tuesday, October 10, 2006

Multi-CPU Binding in Solaris

We are working on a framework which would allow processes/thread to have affinity to more than one CPU. The affinities could be divided into three categories -- (a) strong affinity (b) weak affinity and (c) negative affinity.

(a) strong affinity :- This type of affinity would allow processes/threads to run only on specified CPUs.

(b) weak affinity :- This type of affinity would allow processes/threads to run on its home lgroup or CPUs specified or any CPUs if it can't run on home lgroup/CPUs. The order is also followed in the same way when Solaris Dispatcher would choose a CPU.

(c) negative affinity :- This type of affinity would allow processes/threads to not run on the CPUs specified.

At present, only strong/negative affinity could change thread's home lgroup; so on a NUMA aware machine, users need to be more cautious. These affinity are stored in bitmask of CPUs (cpuset_t). During offline phase, CPU will be removed from thread's bitmask and if it happens to be the only CPU in its bitmask, we would generate an event using contract fs so that application programs can take appropriate action in an event when affinity is revoked during offline or even when a CPU goes out from processor set.

The boundaries laid by CPU partitions will still be there and Multi-CPU binding will not allow processes/threads to cross partitions (or proessesor sets).

Idle thread is also modified to accordingly look for work. Strong affinity threads can't be stolen if a thread doesn't have that CPU in its bitmask. Weak affinity threads can be stolen. Run queue balancing done by setbackdq() is done for all the affinities.

An example of it :-

bash-3.00# ./pbind -s 528-530 `pgrep aff`

bash-3.00# dtrace -s ./a.d ## D script capturing context switches.
CPU no. of times ran
529 197
528 208
530 210

bash-3.00# ./pbind -q `pgrep aff`
process id 3211: not bound
process id 3211: strong affinity to: 528-530

bash-3.00# psradm -f 529 528

bash-3.00# dtrace -s ./a.d ## D script capturing context switches.
CPU no. of times ran
530 255


If you were to offline CPU 530 also, this would cause us to revoke the affinities because this process had strong affinity and there wouldn't be any CPU where it can run. The purpose is to allow offline (for DR or other FMA events). Same hold true for processor set as well if a CPU is removed from the pset and it happens to be be last CPU in the threads CPU bitmask.

We can preserve affinity to a CPU when a CPU is offlined so that when it is brought back users don't have to bother about finding a suitable CPU provided it's not the last CPU in its bitmask. I'm not sure whether it would be good or do we really want to do this. I do have a prototype based on that.

The above demo is just for what we are trying to achive and it's in the prototyping stage.