Thread Libraries

Last time we hinted at how Linux systems distinguish between processes and threads (or should I say... don't), but suppose we want to design a multi-threaded application across similar platforms?

Towards this goal, we require the use of a thread library.

A thread library provides an API for creating, managing, and synchronizing threads, but provides only a specification for thread implementation.

This means that, although the API will make promises for behaviors of the interface, the implementation details are left to the supporting OS designers.

The OS designer has some challenges to address with thread APIs, for example:

Should a thread library / API be a completely user-level entity or should it be part of the OS?
Should a kernel have threads that correspond to each user-program thread? How should kernel threads be organized?

Should users of a thread library care about the answers to questions like the above?

No! That's what's meant by the separation of specification and implementation.

So what do thread libraries, despite their implementation, typically give us?

Thread API Commonalities

There exist a variety of different cross-platform and platform-specific thread libraries, including:

POSIX Pthreads and Java threads, implemented as both user- and kernel-level libraries (though Java through the host OS of the VM).
Win32 and Mach threads, implemented as kernel-libraries in Windows and Mac OS', respectively.

Despite the API's specific implementation, each library gives us some basic functionality:

Thread objects that represent the thread itself: IDs in Pthreads, Mach, references in Win32, and objects in Java.
Execution entry points decide where the thread starts running, like a function reference.
Operations like thread creation and synchronization.

Enough prattle, shall we actually look at a thread library in practice?

Pthreads

Pthreads are a POSIX standard for thread manipulation common in UNIX OS' (Solaris, Linux, Mac OS X).

Let's examine pthread properties and operations, then look at an example that employs them.

All pthreads have:

pthread_t thread identifier (unique ID)
pthread_attr_t thread attributes that determine how a thread's state should be initialized.

Pthread operations then proceed in several typical steps:

pthread_attr_init(&attr) provides default initialization values for the thread's state (stored in a pthread_attr_t variable pointed to by the attr argument).
pthread_create(&tid, &attr, func, arg) creates a new thread with the given tid and attributes to begin execution at the given function, called with the given argument.
pthread_exit(return_val) used by the spawned thread returns from its function with the given return value.
pthread_join(tid, &res) is the thread version of wait for the calling thread (which you might consider the "eldest sibling"), and the returned value of the thread with ID tid is stored in res (res = NULL when we do not care about the return value).

To compile a C program using pthreads, #include <pthread.h> and link using gcc -pthread ....

The following example demonstrates a "sum of sums" application for threads whereby a call like: ./sums 5 10 8 will compute: $$\sum_{i}^5 i + \sum_{i}^{10} i + \sum_{i}^8 i$$

Notice that each of these individually summed terms are independent of one another, are the same operation (summation), and thus are prime candidates for multi-threading.

  /**
   * This program uses the POSIX thread API to calculate multiple
   * summations from 1 to n.
   * Modified from Dondi who modified our text's example:
   *   https://github.com/dondi/bazaar/blob/master/thread-posix/sum.c
   */
  #include <pthread.h>
  #include <stdio.h>
  #include <stdlib.h>
  
  /**
   * Shared total which the sums will update.
   */
  static int sumOfSums = 0;
  
  /**
   * Forward declaration of the summation function
   * (i.e., what gets threaded).
   */
  void *sumRun(void *maxStr);
  
  int main(int argc, char *argv[]) {
      pthread_t *threads = calloc(sizeof(pthread_t), (argc - 1));
  
      pthread_attr_t attr;
      pthread_attr_init(&attr);
  
      int i;
      for (i = 1; i < argc; i++) {
          pthread_create(&threads[i - 1], &attr, sumRun, argv[i]);
      }
  
      for (i = 1; i < argc; i++) {
          pthread_join(threads[i - 1], NULL);
      }
  
      printf("\nFinal sum of sums = %d\n", sumOfSums);
  }
  
  void *sumRun(void *maxStr) {
      int max = atoi(maxStr);
      printf("Starting summation to %d...\n", max);
      int sum = 0;
      int i;
      for (i = 0; i <= max; i++) {
          printf(".%d.", i);
          sum +=i;
      }
  
      printf("Summation to %d = %d\n", max, sum);
  
      // Add to sum of sums.
      sumOfSums += sum;
  }

Run sums.c several times with the same arguments. What do you notice about what prints out, and what is the cause of this peculiarity?

There are different orders of thread execution! This is due to the process scheduler.

Add a delay after each thread's addition to the sum (add usleep(20000) to the end of the for-loop in sumRun). What do you notice about what is printed out, and what does this tell you about the scheduler?

The scheduler does not like to sit idly by! If one thread is waiting (from the usleep call), another is given access to the CPU from the ready queue.

So it should now occur to us that, although we may have a general idea for the responsibilities of the scheduler and the different states of each active process, we have not yet investigated the algorithms used to decide which process / thread should have access to the CPU at any given time!

Scheduling - Motivation and Objectives

As we saw in our sums.c example above, we should investigate just how the OS decides what processes can employ the CPU at any given time.

The short-term scheduler is a kernel module that selects from among the processes in the ready queue to decide which is allocated to the CPU at any given time.

We had modeled the ready queue as a linked list (and may still be so, as in Linux task_structs), but we should also note that the queue may be ordered in different ways.

Note: this means that the ready queue can be implemented as a variety of different data structures depending on the scheduling algorithm (FIFO queues, priority queues, trees, or linked lists, e.g.)

Scheduling Impeti

Before any of that, we should consider: in what parts of a process' lifetime is a scheduler necessary?

Scheduling comes into play during a process' state transitions, and can be either cooperative (scheduler invoked after a process has volunteered to transition) or preemptive (scheduler invoked to interrupt a process).

What 2 process state transitions should be handled by a cooperative scheduling scheme?

When the process volunteers a transition, namely:

[Running $\rightarrow$ Waiting] as by an I/O event or invocation of wait for spawned child processes / threads.
[Any State $\rightarrow$ Terminated], at which point the process can be removed from the active queue.

What 2 process state transitions should be handled by a preemptive scheduling scheme?

When the process is interrupted, namely:

[Running $\rightarrow$ Ready] as by any interrupt.
[Waiting $\rightarrow$ Ready] as by completion of I/O, at which point the scheduler must determine if the newly-ready process should be immediately dispatched, or wait its turn.

Both of these scheduling schemes can have some predictable issues:

Cooperative schemes can allow a process to monopolize system resources, but preemptive schemes have complications with shared data (which we will explore in depth later).

Scheduling Criteria

Before we investigate the data structures and algorithms that go into particular schedulers, we should consider what metrics we want to optimize for a "good" scheduling algorithm.

What are some metrics of success on which we could judge different scheduling algorithms?

The five most common are:

[Maximize] CPU Utilization: the CPU should stay busy as much as possible.
[Maximize] Throughput: complete as many processes as quickly as possible. Throughput is a measure of completed processes per time unit.
[Minimize] Turnaround time: minimize the interval from time of process submission to the time of completion.
[Minimize] Waiting time: minimize only the time spent waiting in the ready queue (subset of the turnaround time).
[Minimize] Response time: minimize time it takes for process to produce first response to user.

Now that we have some targets set, next time we'll investigate the implementations that attempt to address these!