Semaphores

Last class, we hinted at a kernel-assisted means of obtaining a lock for a thread's critical section so as to solve the critical section problem.

As it turns out, OS' provide just such a mechanism in what is known as a semaphore (a word with origins that were unfamiliar to me; apparently a semaphore, outside of the context of an OS, is a signaling system typically done with flags like in old-timey nautical communication).

This notion of "signaling" precisely encapsulates the purpose of a semaphore in the OS context as well.

A semaphore is a synchronization tool that is implemented as an integer variable maintained in a table stored in kernel memory to signal when cooperating processes are free to enter their critical sections.

A semaphore supports 2 atomic operations that are not preemptable in kernel-mode, whose implementations can vary (and will be discussed shortly):

wait(S) locks semaphore S if it is unlocked before continuing, otherwise blocks execution of the calling process until semaphore S is unlocked
signal(S) unlocks semaphore S, indicating that the calling process has completed its critical section

Before we look at how to use semaphores programmatically, we should consider how they are actually implemented in the kernel.

Semaphore Implementation

If a semaphore is simply an integer variable, a reasonable first question of implementation is: "Where does that ish live?"

Semaphores can either be:

Named, in the case they are maintained in a message-passing IPC schema in the kernel
Unnamed (memory-based), in the case their memory is manually reserved by the progammer in a section of stored memory.

Regardless of where the semaphores themselves are stored, the important parts of their implementation are just how the state of a semaphore's "lock" status is maintained.

Spinlock Implementation

The most basic implementation format follows our intuition from last time wherein we (in vain) attempted to implement a semaphore using only the tools available to us in the user space.

In the originally proposed implementation, semaphore operations were quite basic but can suffer from some computational overhead:

  wait (S) {
      while (S <= 0)
          ; // busy-waiting
      S--;
  }

  signal (S) {
      S++;
  }

Examining the above semaphore implementation, what is the chief weakness of the chosen approach?

The implementation of wait engages in busy waiting, wherein processes that are waiting for a semaphore to unlock must continuously loop in their entry sections, consuming CPU resources that could be better spent on other processes.

On the flip side, is there some benefit to the spinlock approach?

On multi-processor systems, one processor can be devoted to a spinlock while another process is completing its critical section on a separate processor, requiring no context-switch from the waiting process once it can finally acquire a lock.

That said, in the case where there are many processes waiting on a particular shared resource, we may have fewer processors than processes in spinlocks, and so it may be appropriate to consider alternative implementations.

Propose a way to implement a semaphore without "busy waiting" -- what data structures would you need to implement this approach?

We can instead associate a ready queue with each semaphore, adding processes to the ready queue when they need to wait, and dequeuing those that are next in line to lock the semaphore.

Queue Implementation

In the queue implementation, each semaphore possesses an associated ready queue with two data members: (1) value, the integer value of the semaphore, and (2) a pointer to the next PCB in the queue.

Queue implementations thus support 2 additional operations:

block places the calling process into the waiting state
wakeup recalls a process from the waiting state and places it into the ready queue

This can be implemented as succinctly as a simple struct in C:

  typedef struct {
      int value;
      struct process* list;
  } semaphore;

And thus, we now have two slightly more complex implementations of the wait and signal operations:

  wait (semaphore* S) {
      S->value--;
      if (S->value < 0) {
          add process to S->list
          block();
      }
  }

  signal (semaphore* S) {
      S->value++;
      if (S->value <= 0) {
          remove process P from S->list
          wakeup(P);
      }
  }

Note: different OS' typically involve both implementations in some ways, shapes, or forms; e.g.

Windows XP disabled interrupts for kernel during semaphore operations and had spinlocks for multi-processor systems
Linux: similar to XP
Mac OS X: has a more complex multi-layer mechanism of spinlocks and mutexes

Semaphores in C

Let's take a look at the POSIX system calls used to manipulate semaphores, whose full documentation can be found here:

POSIX Semaphore Overview

Included in the semaphore.h specification for POSIX semaphores are several key operations to be used on special sem_t typed semaphores:

sem_init(sem, pshared, startvalue) is the syntax for initializing an unnamed semaphore pointed to by sem, which can be optionally shared between processes with the pshared flag, and attains the given startvalue.
sem_wait(sem) is the atomic wait operation described above.
sem_post(sem) is the atomic signal operation described above.
sem_destroy(sem) releases a semaphore from the OS' semaphore table once it is no longer required.

Example

To appropriately synchronize a critical section, simply place a call to sem_wait in the entry section and sem_post in the exit section surrounding the critical section.

We'll return to our basic sync example from last time and see how to properly synchronize the critical section of our many threads:

  #include <pthread.h>
  #include <unistd.h>
  #include <stdio.h>
  #include <semaphore.h>
  
  #define THREAD_COUNT 10000
  // Shared total which the threads will update.
  static int shared = THREAD_COUNT;
  sem_t sem; // [!] Shared semaphore
  void* race();
  
  int main(int argc, char* argv[]) {
      pthread_t *threads = calloc(sizeof(pthread_t), THREAD_COUNT);
      pthread_attr_t attr;
      pthread_attr_init(&attr);
      sem_init(&sem, 0, 1); // [!] Initialize semaphore
  
      int i;
      for (i = 0; i < THREAD_COUNT; i++) {
          pthread_create(&threads[i], &attr, race, NULL);
      }
      
      for (i = 0; i < THREAD_COUNT; i++) {
          pthread_join(threads[i], NULL);
      }
      
      // Should always be 0 if synchronized; is it?
      printf("Final value: %i\n", shared);
      sem_destroy(&sem); // [!] Release semaphore
  }
  
  void* race () {
      sem_wait(&sem); // [!] Atomic wait
      // [!] Critical section ---------------------
      shared--;
      printf("Decrementing; now at: %i\n", shared);
      // ------------------------------------------
      sem_post(&sem); // [!] Atomic signal
  }

Pthreads API

Note: the above system calls are the "primitive" means of using semaphores for synchronization.

Most modern applications will use a portable library like Pthreads to enable not only cross-platform support, but also prevent user-errors in primitive synchronization system calls, such as:

  ...
  // [X] Double wait:
  sem_wait(&sem);
  sem_wait(&sem);
  // Critical section:
      // ...
  sem_post(&sem);
  ...

You can find the Pthread analogs to the system calls above here:

Pthread Semaphore Implementations