Kernel concurrency

Concurrency mechanisms inside kernel

Explanation how to solve concurrency issues inside a kernel

If you want to make a kernel that works on machines with multiple CPUs, you need to solve a problem of CPUs accessing shared resources of all kind. Concurrency is hard to get right for many reasons and sometimes you need one than more approach for it to finally click. I’ll share what worked for me by going from describing what the problem is and building solutions from the ground up.

1. What’s the problem with CPUs and shared resources?

Let’s assume that in our kernel we have a global integer variable holding some value int counter = 0 and there are two processes that increase that counter by 15. Why would they do that and for what purpose is unimportant.

(...)
counter = counter + 15;
(...)

This wouldn’t be a problem if adding 15 to the counter was done in single, indivisible (atomic) step, but it isn’t. This line in C, compiled by gcc for RISC-V 32-bit machine looks something like that:

        lui     a5,%hi(counter)
        lw      a5,%lo(counter)(a5)
        addi    a4,a5,15
        lui     a5,%hi(counter)
        sw      a4,%lo(counter)(a5)

These are five CPU instructions to execute. Nothing prevents two processes on two different CPUs to executed these instructions simultaneously, and I mean exactly instruction by instruction at the same time. It would mean every process would load current value of counter from shared memory to own register a5, increase value stored in a5 by 15, and then write value from register back into shared memory. There were 2 processes, there was one counter with starting value 0 and yet when both processes finish we would see counter value as 15, not 30. That’s not good. Problem with counter is just an illustration, shared data might be much more complex, these could be linked lists that are updated in complex ways and might lead to unpredictable results. We need to solve problem of access to shared data to make these operations reliable and predictable. We need to make that counter = counter + 15 operation look like it was a single indivisible step that can be executed by just a single process at a time.

2. How to make multiple steps indivisible and exclusive to a single process?

We need a way of communication between processes that don’t know anything about each other. Each process knows just that there is a counter and it needs to increment it.

The is a way to share information between processes, we already do it. counter is shared, for example. Maybe we can create a second global variable, int counterLocked = 0 that will be used as a way for one process to communicate to other processes that theyt can’t increment counter at this time. If counterLocked == 0 then counter can be safely accessed and incremented, if counterLocked == 1 then some process is already working with counter so process that checks counterLocked can’t proceed and must wait for their turn.

That’s a great idea but has one problem, and it’s the same problem that lead to introducing counterLocked variable in the first place. Concurrent access by multiple processes. There is still nothing that prevents two processes to read value of counterLocked, see that it hold value of 0, write 1 and claim they own it now.

That’s a problem that can’t be solved in software alone, we need CPU producer’s support to give us instruction that allows us to check value of some variable, compare and write new value in a single step. And that’s exactly what they do. They give us instruction that can write to memory in a single step and exclusively by just one CPU at a time.

When compiling with gcc for RISC-V hardware target, we can call __sync_lock_test_and_set. That’s a function that from C point of view is a function taking pointer to variable as first argument and value to assign to that variable as second argument. Return value is value of passed variable before passed value was assigned. Now we have an ability to atomically write to new memory location and inspect previous value.

int wasLocked = __sync_lock_test_and_set(&counterLocked, 1);

If wasLocked == 0 then we know that it was current process that set counterLocked to 1, but if wasLocked == 1 then it means that current process set counterLocked to 1 but it was already locked by some other process.

So there is a safe way to check if some shared resource, like our counter can be accessed by checking value of some other shared resource, like our counterLocked.

That’s great but there are still some problems with that.

First, this mechanism doesn’t tell what process should do if counter is being used by other process. It is up to process that encounters this issue to figure it out, it can’t continue without increasing counter but can’t access it at the same time.

If process can’t proceed without having exclusive access to counter we can make it wait in a loop.

while(__sync_lock_test_and_set(&counterLocked, 1) == 1)
    ;

This loop will not stop until value in counterLocked was 0. Indefinitely, it something goes wrong and some other process on other CPU won’t assign counterLocked = 0 leading to a deadlock.

At this point any process that wants to work with our counter needs to have in their code something like that:

// do something before counter access

while(__sync_lock_test_and_set(&counterLocked, 1) == 1)  // get exclusive access to counter
    ;

counter = counter + 15;

counterLocked = 0;  // give up exclusive access to counter

// do something after counter modified

In our example there is just single line of work with exclusive access to counter but there could be as much code as we like and it would have the same benefit as our single line - only one process at a time can execute anything that requires counterLocked.

This almost solves our problem of making multiple instructions exclusive and atomic from processes point of view by creating a critical section that is only allowed to be executed by one process on one CPU at a time. Whoever set counterLocked from 0 to 1 has the access. Others processes will spin in a while loop until their time comes. They busy wait for their turn, constantly checking if it’s their time to snap out of the loop.

I said we almost solved our problem, because there are interrupts. They can come from a device or from timer but are asynchronous, can show up and take over the CPU at any time. If process that holds a lock gets interrupted by a timer and other process gets scheduled instead, there is possibility that it might lead to a deadlock.

xv6 takes a safe approach and simply disables interrupts on a CPU before allowing process to enter critical section. If we take that approach, ever process working with counter would need to execute procedure that looks like this:

// do something before counter access
InterruptsDisable();
while(__sync_lock_test_and_set(&counterLocked, 1) == 1)  // get exclusive access to counter
    ;

counter = counter + 15;

counterLocked = 0;  // give up exclusive access to counter
InterruptsEnable();
// do something after counter modified

How InterruptsDisable and InterruptsEnable are implemented is platform specific, it just needs to do it.

With all above we have a way of properly incrementing our global variable counter from different processes on different CPUs, only if they include this code.

That’s a lot of work just to have one safe access to one shared variable and a lot of copy-paste.

Maybe we can abstract something from this away and make it reusable.

3. How to make a generic way of creating critical sections in kernel code?

We’ve created a single critical section for access to counter and from that exercise a pattern emerges for a generic ciritcal section.

Every process has to disable interrupts and spin in a while loop trying to write 1 to counterLocked, our variable that processes use to communicate.

We could bundle all of that into a single function CriticalSectionAccess that takes as argument pointer to counterLocked.

void CriticalSectionAccess(int *criticalSectionLocked)
{
    InterruptsDisable();
    while(__sync_lock_test_and_set(&criticalSectionLocked, 1) == 1)
        ;
}

This function will never return, unless access to critical section has been granted for process that executes it.

It is also very generic in the way that any int variable can be critical section guardian.

To make it super clear that some variable is more than just an int we can make it extra clear by hiding it behind something more explicit.

struct spinlock
{
    int isLocked;
};

Now we can make our function that grants access to critical section even more generic:

void SpinlockAcquire(struct spinlock *sl)
{
    InterruptsDisable();
    while(__sync_lock_test_and_set(&sl->isLocked, 1) == 1)
        ;
}

This function is almost exactly the same as it was before but now it is clear what it does and on what kind of argument it works. Also, the naming is now consistend with other resources.

We call our structure a lock because it guards access to a critical section. If it is locked you can’t enter. If it’s open you can acquire that lock, and close it behind you. Others will see it as locked and wait. We call our structure a spinlock because of the way it is accessed. SpinlockAcquire will spin in a loop constantly checking if lock if free and will never stop until it finds a spinlock open and acquires it.

Spinlock created this way is a generic way of creating critical sections of all kind in many processes.

If you have anything that needs to be shared between processes you give it a spinlock and every force every process to acquire that spinlock before accessing it.

// global process array
struct proc[MAX_PROC_NUM];
struct spinlock procArrayLock;



// some function that can be called in a process to work on process array

SpinlockAcquire(&procArrayLock);

// work on array

SpinlockRelease(&procArrayLock);

SpinlockRelease bundles up release of a lock the same way SpinlockAcquire bundled locking. SpinlockInitialize should be implemented and called to make sure that lock is initially open.

4. What to do if in critical section process doesn’t like what it sees?

Current state of affairs is that we can safely enter a critical section, do some work on a shared resource and leave safely.

That’s great but what if we’re in a critical section but shared resource is in a state that prevents us from doing our job?

Our global counter worked pretty nicely so let’s use it again.

Now it looks like that:

int counter = 0;
struct spinlock counterLock;

We have our variable and a spinlock to guard access to it.

There are processes running around incrementing it in a safe manner so let’s assume there is another one that doesn’t want to increment it but do something else depending on counter value. Maybe trigger some procedure when counter > 200? Maybe halve counter? Oh, and it’s this processes only job, halve a counter when it’s > 200. Indefinitely.

We know we need to acquire counterLock before access and release after we’re done.

while(1)
{
    SpinlockAcquire(&counterLock);
    if(counter > 200)
    {
        counter = counter / 2;
    }
    else
    {
        // do what?
    }
    SpinlockRelease(&counterLock);
}

This would work, timer interrupts would have a chance to stop that process and give scheduler chance to run different process. But is there a point in constantly giving CPU to a process? It spins on a lock, checks counter and leaves. If counter didn’t change very often that would be very inefficient and it would use up precious CPU cycles from processes that have some meaningful job to do.

What we would like is a way for a process to give up CPU (sleep) until some other process touches counter and notifies (wakes up) processes that are interested in changes of counter.

That’s the idea behind xv6’s sleep() and wakeup() functionality.

sleep takes two arguments, one is called a chan, but this is basically an integer, agreed upon processes way to identify who is sleeping when it should be woken up. sleep internally releases given spinlock, sets up internal process variable to remember on what chan it is sleeping and changes process state to SLEEPING. Scheduler will skip them from now on.

wakeup takes one argument, a chan, iterates over all processes and changes state of every process sleeping on a given chan to READY. Scheduler will schedule them on next iteration. Internasl of how sleep and wakeup are explained in more detail in other sources. For our purpose, we assume they work properly and never lead to missed wakeups, situation where there is a race condition between process going to sleep and waking up.

Now, with sleep and wakeup we can answer previous question, what to do if counter wasn’t over 200.

while(1)
{
    SpinlockAcquire(&counterLock);
    while(counter <= 200)
    {
        sleep(&counter, &counterLock);
    }
    counter = counter / 2;
    SpinlockRelease(&counterLock);
}

The reason behind putting check of counter inside a loop? There might be more processes waiting for change of counter, each waiting for modifying it. wakeup wakes up all waiting processes at the same time. Process waking up other processes left counter at some value, but this value might change before sleeping process wakes up.

For it to work, we need to modify processes that increment counter, too. This sleep-wakeup setup requires further cooperation between processes. Previously they cooperated only through a lock, now they also have to notify each other of their actions. They don’t know anythin about each other, that’s not like they know each other PID, but that’s a cooperation nonetheless.

// do some work
SpinlockAcquire(&counterLock);
counter = counter + 15;
if(counter > 200)
{
    wakeup(&counter);
}
SpinlockRelease(&counterLock);

From all above another pattern emerges. If we ever want process to not busy wait for something to happen, we can put it to sleep.

SpinlockAcquire(&sl);

while(!cond)
{
    sleep(&cond, &sl);
}

SpinlockRelease(&sl);

We just need to make sure that other process wakes it up.

SpinlockAcquire(&sl);

// do some work that might change the condition
wakeup(&cond);

SpinlockRelease(&sl);

To sum thing up, now we have two important tools:

  • spinlock to create critical sections in our code
  • conditional variables that with sleep/wakeup functions allow processes to get out of critical section and sleep until condition changes.

5. What if spinlock guarded critical section takes very long?

Remember that SpinlockAcquire disabled interrupts? What that means is our critical section in current form can’t be sliced. Yeah, process in critical section can give up their lock and go to sleep, but it has to be intentional. There’s nothing stopping that process to execute in critical section as long as it likes. What that means is every process waiting for the same spinlock will busy wait.

What if we have 20 processes on 10 CPUs and 10 of them want to SpinlockAcquire the same lock? The first process acquires it and does its work. If it was coupe of instructions, then nothing bad would happen.

But what if this process did some heavy calculations for a second? If other 9 processes were scheduled on other 9 CPUs? They’d busy wait for that lock. There are 10 processes that don’t need that lock, they could do their job just fine, but they won’t. As long as first process doesn’t give up lock - 9 CPUs are useless. That’s bad. Very bad.

It would be great if we could find a way for processes to communicate. When lock can’t be acquired then put process to sleep until some other process releases the lock and wakes up everyone waiting.

Let me rephrase that.

When condition in critical section isn’t met, the put process to sleep until some other process changes the condition and wakeup other processes wating for that condition to change.

That’s the problem we have already solved, we have a pettern that allows process to enter critical section, check some condition and sleep until some other process changes the condition and wakes it up.

If we replace condition with lock we have described a some kind of critical section guard. It’s hard to imagine, describing lock inside lock, critical section guard with a critical section guard sound terrible.

It’s better to see how it might look in practice.

Our condition in critical section pattern is a string point.

SpinlockAcquire(&sl);

while(!cond)
{
    sleep(&cond, &sl);
}

SpinlockRelease(&sl);

Let’s assume that our cond is an integer isLocked, the same way it worked before we packed that int inside a struct spinlock. We also need a spinlock to make access to that guard safe from multiple processes, the same way we had cond and sl above. counter is our shared resource.

int counter;
int isLocked;
struct spinlock guard;

Any process that wants to access counter needs to have this code to check value of isLocked.

SpinlockAcquire(&guard)
while(isLocked)
{
    sleep(&isLocked, &guard);
}

isLocked = 1;

SpinlockRelease(&guard);

// proceed with modifying counter


SpinlockAcquire(&guard);
// release the lock when done
isLocked = 0;

SpinlockRelease(&guard);

That should look familiar, because it is almost exactly like situation with spinlocks before. Lots of code to put in every process that wants to access some shared resource. With spinlocks we bundled everything together and made it as generic and easy to use in multiple places as possible.

Let’s do the same thing here.

First let’s bundle isLocked and guard together. These two belong together anyway, so let’s make sure we don’t forget which lock has which guard. Let’s call it sleeplock, to make sure we don’t confuse them with spinlock. These are different, they are build different and have different behavior. What they share is their purpose - to create critical section.

struct sleeplock
{
    int isLocked;
    struct spinlock guard;
};

And now function that would acquire such a lock.

Remember that this function won’t spin on CPU if sleeplock is taken but that doesn’t change that it will never return, until it has changed sl->isLocked from 0 to 1. It just might give up a CPU while it can’t do it immediately.

void
SleeplockAcquire(struct sleeplock *sl)
{
    SpinlockAcquire(&sl->guard);
    while(sl->isLocked)
    {
        sleep(&sl->isLocked, &sl->guard);
    }
    sl->isLocked = 1;

    SpinlockRelease(&sl->guard);
}

Releasing such a lock would look something like that:

void
SleeplockRelease(struct sleeplock *sl)
{
    SpinlockAcquire(&sl->guard);

    sl->isLocked = 0;

    wakeup(&sl->isLocked);

    SpinlockRelease(&sl->guard);

}

And that’s how we would use those in code:

int counter;
struct sleeplock counterGuard;

Any process that wants to access counter needs to have this code to check value of isLocked.

SleeplockAcquire(&counterGuard);

// proceed with modifying counter

SleeplockRelease(&counterGuard);

Sleeplocks provide two benefits:

  • processes waiting for a sleeplock don’t take up CPU, they go to sleep
  • process that holds a sleeplock can be interrupted, it won’t give up a lock but it can be forced to give CPU to other process for a slice of time

We went from a lock that is basically and integer accessed by an atomic intstruction hidden behind __sync_lock_test_and_set call, packed that functionality nicely into a spinlock to make uninterruptible critical sections, and then packed a spinlock and a simple integer working as a lock to create a sleeplock to be able to create critical sections that can be interrupted by CPU.

There’s a lot left out from the subject. I’ll be returning to this entry and improving it with time.

OSDEV
concurrency osdev spinlock sleeplock