Process Manipulation in C

As we saw last time, processes form a hierarchy of parent-child relationships in which parents spawn child processes.

We saw some examples of this using the Bash terminal, in which each command executed by a user is actually a child process spawned by Bash!

However, we should take a step back and consider the purpose(s) of a hierarchical process structure to begin with.

What are some of the reasons why we might want to allow a parent process to spawn child processes?

A variety of reasons, including:

Parallelism: if a parent process is responsible for a large task, but can compartmentalize that task into smaller bits handled by other processes, then it is possible to enable parallelism, sometimes exploiting multiple processors to finish a job more quickly.
Separation of concerns: clean coding to modularize tasks; easier to debug at times.
Hierarchical control: parents should be able to terminate children for a variety of reasons, including when it is no longer needed for its task, or the parent is exiting, but should *not* be able to terminate processes that are ancestors.

Motivated by the above, let's see how we can create child processes programmatically in Unix systems!

`fork()`

The most important utensil in the OS' arsenal! Although process creation goes by other names in other OS', fork is the variant used in the POSIX interface.

The fork() system call spawns a child process from the calling parent, performing the following steps:

A new PCB is associated with the child process, storing the parent process' PID from which it was created.
In memory, the address space of the parent is copied into the child's, byte for byte.
Both parent and child continue executing from the instruction immediately following the fork() instruction, with one key exception:
- [Parent] in the parent process, the returned value of fork() is the PID of the newly created child, of type pid_t
- [Child] in the child process, the returned value of fork() is 0.
- [Error] in either process, if the fork failed, its return value will be some negative number.

The distinct return values from fork() allow us to not only use a single source code file for multiple process behavior, but also to distinguish whether a process is a parent or child from within the source itself.

Take the following source for example:

  #include <stdio.h>
  #include <unistd.h>
  #include <sys/types.h>
  
  int main() {
      // pid_t just an int, but type pid_t allows for a little
      // more portability across platforms with different int size
      pid_t pid;
  
      // The miracle of life!
      pid = fork();
      
      // [!] Both parent and child continue execution below
      // Child process here:
      if (pid == 0) {
          printf("I'm just a kid so I see pid = 0: %d\n", pid);
      
      // Parent process here:
      } else if (pid > 0) {
          printf("I'm Papa Process; my kid has pid: %d\n", pid);
      }
      
      return 0;
  }

These processes are running "concurrently," and therefore there is no real guarantee that the parent's print statement will be executed before the child's. This will be a major point of process "synchronization," to be discussed later.

However, there is an easy tool to ensure some level of synchronization between parent and child processes via the wait() syscall.

`wait()`

The wait(&status) syscall suspends execution of the calling process until one of its children terminates, storing the exit code of the child process in the provided int status.

The waitpid(pid, &status, options) syscall can specify the pid of the child process to wait for.

So, briefly modifying our code above:

  #include <stdio.h>
  #include <unistd.h>
  #include <sys/types.h>
  #include <sys/wait.h>
  
  int main() {
      // pid_t just an int, but type pid_t allows for a little
      // more portability across platforms with different int size
      pid_t pid;
  
      // The miracle of life!
      pid = fork();
      
      // [!] Both parent and child continue execution below
      // Child process here:
      if (pid == 0) {
          printf("I'm just a kid so I see pid = 0: %d\n", pid);
      
      // Parent process here:
      } else if (pid > 0) {
          printf("I'm Papa Process; my kid has pid: %d\n", pid);
          int status;
          wait(&status);
          printf("All done! Result was: %i\n", status);
      }
      
      return 0;
  }

`exec()`

Consider that a forked process copies its parent's memory footprint once forked.

Under what circumstances might this copy be wasteful?

If the child process has no need for the data copied from the parent, including the remaining instructions in the test section!

For this reason, another useful system call can actually replace the child's footprint in memory with that of a separate executable.

The exec() family of system calls replaces the calling process' image in memory with that of a separate executable, as though the parent had called the other executable to begin with.

In the example below, the child process is replaced by a call to ls as soon as the exec syscall is reached:

  #include <stdio.h>
  #include <unistd.h>
  #include <sys/types.h>
  #include <sys/wait.h>
  
  int main() {
      pid_t pid;
  
      /* fork a child process */
      pid = fork();
    
      if (pid == 0) {
          printf("Child %d\n", pid);
          execlp("/bin/ls","ls",NULL);
      }
      else {
          printf("Parent %d\n",pid);
          wait(NULL);
          printf("Child Complete\n");
      }
        
      return 0;
  }