Adversarial Search

In our first part of the course, we examined some basic search scenarios wherein our agent only need worry about its own actions.

These types of single-agent, initial-state-to-goal problems are usually referred to as classical search problems.

"I found Beethoven!" you proclaim! (well, no not that kind of classical search).

Conversely, today, we get to some more... combative variants of search problems wherein we have not only a single agent whose actions we must guide, but perhaps a variety of other agents in the same environment!

Multi-agent environments are those in which an agent's plan or solution to a problem are a function of other agents as well.

Unlike in our Maze Pathfinding examples from the lectures of yore, in which our agent's decisions were strictly a consequence of non-reactive environmental features, now we must consider that our agent's actions must be contingent on how other agents act!

There are a variety of interesting problems in which agents act cooperatively in an environment, such as with drone coordination and triangulation, but perhaps more interesting is the spice added by CONFLICT.

In these types of problems, we consider some other agents to be acting in their own self interest, which may prevent our agent from acting in its own.

Adversarial search problems (AKA games) feature multi-agent environments in which agents' goals are in conflict.

Consider 2 agents in a maze pathfinding problem; ours is trying to reach some goal, and another is trying to block us. Will our previous approaches to search serve us here?

No! Our previous approaches planned a sequence of actions from the initial state to a goal, but that plan can be interrupted by another agent acting to prevent it. We'll need to make our agents reactive to adversaries!

Games were of intense interest in early development of AI as Chess Playing bots became a sort of metric for the success of an AI design.

The modern era has artificial agents competing in even more complex games requiring astounding amounts of planning and foresight (see Google's recent advances in bots able to play the game of Go).

Canonical Games

In AI and a branch of mathematics called "game theory," the most commonly analyzed games are somewhat basic and have the following properties.

The qualities of a basic / canonical adversarial search game are as follows:

Two-players (adversaries)
Turn-based, such that $P_1$ acts first, then $P_2$, then $P_1$, etc.
Perfect information, such that the game state is fully observable
Zero-sum, such that one player "wins," the other "loses", or both "tie"

Give some examples of games that match these qualities.

Plainly, there are many interesting games with perturbations to these qualities.

For example, card games are instances of those with imperfect information since each agent cannot see the hand of the other.

Give some examples of games that have perturbations of these qualities.

As we've seen, everytime we perturb some quality of a problem, the environment is drastically changed, and so our approaches in pursuit of solutions change as well.

Next, we'll see an example of one such canonical game, and start to develop the tools that will well suit a solution strategy.

Adversarial Problems

Just as our first task in classical search problems was to formalize the problem and its environment, so too will we need to do so in adversarial environments.

To get us started, we'll look at a motivating example game, followed by how we might formalize it for a reasoning system.

The Game of Nim

The game of Nim has a rich history with many variants to its otherwise simple ruleset. We'll look at one variant to motivate today's discussion:

The Game of Nim is a two-player game wherein:

The game begins with some $N$ number of stones in the "pool."
On each of their alternating turns, two players are allowed to remove $1, 2,$ or $3$ stones from the pool.
The player who removes the last stone is the winner!

Play a game of Nim starting with $N = 12$ stones!

Now that we've seen it in action, let's formalize its components and see how to create an agent that plays it intelligently.

Adversarial Problem Formalization

An adversarial search problem is parameterized by 6 qualities (enumerated below).

I. The State formalizes all game-specific features that are liable to change.

What would be a good representation of the state for the game of Nim?

The number of stones left in the pool, duh.

Here's something extra that we have to specify for adversarial search problems that we didn't need to specify in classical search.

II. The Player formalizes whose turn it is to act in a given state $s$, or formally: $$Player(s) \in \{P_1, P_2, ...\}$$

This is fairly simple in the game of Nim, since turns alternate; we'll see this in action shortly.

Now, some more standard concerns:

III. Actions, formalizes the set of all actions available from a given state $s$: $$Actions(s) \in \{a_1, a_2, ...\}$$

IV. Transitions, formalize how some action $a$ transforms the current state $s$ to the next state $s'$: $$Result(s, a) = s'$$

Whereas previously we had goal states for classical search problems, what must we similarly formalize for adversarial ones?

What states will terminate the game, and how good those terminal states are for each player.

Apropos, we'll define:

V. Terminal tests, parameterized by some state $s$ will determine if the state ends the game: $$Terminal(s) \in \{T, F\}$$

VI. Utility functions, parameterized by some terminal state $s$ and some player $p$, returns a numerical score $u$ for that state from the perspective of $p$: $$Utility(s, p) = u$$

Generally, speaking, for two states $s_1, s_2$, $Utility(s_1, p) ≥ Utility(s_2, p)$ implies that $s_1$ is more desirable to player $p$ than $s_2$.

Note: the choices of $u$ are arbitrary as long as we can defend how they scale to the "goodness" of a particular state.

In zero-sum games (better called "constant sum"), the sum of utilities across all players is generally considered constant.

One reasonable utility score for Nim is to have, for some state $s_1$ in which $P_1$ wins: $$Utility(s_1, P_1) = 1; Utility(s_1, P_2) = 0$$

In this case, terminal state utilities are always summed to 1 across both players.

Equipped with these qualities, we can now start to proceduralize the search for a plan of attack.

Mini-Max Search

Let's start by considering how we started out explorations for search in the classical domain, then see where to go from there.

How did we explore possible solution paths in classical search? What structure did we use and can we apply that here?

We used search trees! We can try to apply that to our adversarial search problem, except that every other state along a path will be decided by our opponent.

There's no reason we can't use a similar tactic for adversarial search, we'll just need a couple more mechanisms.

A game tree is a tree that shows:

All possible moves from all players to the game's terminal states.
Utility scores for each player at each of the game's terminal states.

Draw the game tree for a small game of Nim with $N = 5$

Now that we have the game tree displayed in all of its gory detail (already quite a bit for just an $N = 5$!), we should start to consider how we can use this to decide how our agent should act.

In classical search, we had the notions of goals, which were terminal states that also satisfied the problem.

In adversarial search, we have terminal states, but what is different about them compared to classical search?

Some are goals for one player (i.e., meet some optimization criteria) and some are goals for another.

Recalling our components of an adversarial search problem, how then can we distinguish which states are good for which player?

Score them with a utility function! Recall that this is a zero-sum problem (more accurately, a "constant sum" problem) where a single player either loses (e.g., utility = 0) or wins (e.g., utility = 1).

As such, we can score our terminal states from the perspective of our agent appropriately, and then see how we can develop a strategy from there.

Score the terminal states of our game tree above from the perspective of the first player (i.e., the root); consider this to be our agent, $P_1$ such that we compute: $$Utility(s, P_1)~ \forall ~ s \in \{s': Terminal(s')\}$$

Utility Scores

Now let's consider: how do we use these utility scores to plan the best action for our agent?

Since we have to worry about our opponent acting against our goals, we should plan for the worst and then adapt. What is the worst-case scenario for our opponent's action policy?

That they act optimally! In other words, they always choose the best option available to them.

Under this assumption, we want to act optimally assuming that our opponent does as well.

In adversarial search problems, optimal decision-making occurs when an agent plans for an optimal opponent.

In cases where our opponent does act optimally, we are not surprised; in those when they do not, we will do even better than projected!

All that's left is to characterize how an optimal opponent will act, and then how we can respond.

If our opponent is always trying to win, is always acting optimally, and an opponent win means that we get a score of $0$, then what are our opponent's actions trying to do to our score?

Minimize it!

Note: this is a consequence of Nim being a "zero-sum" game, because an opponent trying to maximize their score is the same as them trying to minimize ours.

By the same token, if our opponent is trying to minimize our score, then we're trying to maximize it.

Thus, in a zero-sum game tree, opponent states can be characterized as "Min Nodes" and our states as "Max Nodes".

Mini-Max Mechanics

Mini-max search is a search strategy that, given a current state and a problem specification, will determine the next optimal action for our agent, or formally: $$Minimax(state, problem) = bestActionFromState$$

Classical search looked like $Search(problem) = [a_1, a_2, ...]$ where the result was a sequence of actions. Why does Mini-Max search return only a single action from the current state?

Because we cannot formulate a plan that is independent of our opponent's actions -- we must instead see how they act after our chosen optimal action from the current state!

The basic steps of Mini-Max are thus:

Look ahead at possible terminal states from the actions available to us
Consider what paths our (optimally acting) opponents will take
Determine the best path accordingly

So, now we have to formalize how our agents are meant to make a decision based on all of the possible outcomes to those decisions.

Give some possible algorithms for how we should "guide" our agent to the optimal solution, given the game tree with utility scores at the terminal states.

The prescription of Mini-Max search suggests that we "score" each non-terminal state, and then make a decision that maximizes that score.

The answer is right in front of us:

We can associate a score with each node such that:

Every TerminalNode is scored via its utility.
Every MinNode attains the minimum score of its descendants.
Every MaxNode attains the maximum score of its descendants.

The optimal action (for Max) from the current state is thus the one that maximizes the score at the root.

Score each of the intermediary states in our Nim $N = 5$ game tree, then determine the best action to take from the root.

Given that we need to score the terminal nodes first, what kind of search strategy is Mini-Max a variant of?

Depth-first, since we must consider the deepest nodes (the terminal states) before scoring the others.

Given the Mini-Max scores at each node, which action should we (the maximizing player) take at the root?

We should take just 1 stone from the pool.

And there you have it! Mini-Max Search in a nutshell.

Observant readers will note a couple of issues with our game-tree approach:

It examines some states unnecessarily, expanding certain explorations that are worse than what has already been found.
It is not practical for larger problems; consider chess in which the state-space of any given node is massive!

We'll look at each of these issues next!

α-β Pruning

Let's start by trying to optimize the number of nodes we need to expand in our game tree to perform Mini-Max search.

Consider the following stage in our Mini-Max game tree generation using the depth-first strategy:

Once we have explored the subtree from the $-3$ action at the root, and are considering the $-2$ action, do we need to expand other children of the root's $3$ child as soon as we've discovered that a winning move for Min exists?

No! Formally, we see that, from the $-3$ action, that route's score would be 0 (an optimally playing opponent would win). So, as soon as we see that the $-2$ action would lead to an opponent's win as well, we need not look any further -- that action will lead to an equivalently scored outcome as one we previously explored.

Consider the $1$ child of the root's $3$ node: this would lead to a victory for Max, but is not a move that the Min player would ever take, given that there exists a winning action for Min from the parent (and so the $1$ node should be ignored).

As such, we have identified scenarios in which the Mini-Max search need not completely populate the game tree.

Plainly, this would lead to savings in computational and space complexity, especially for larger game trees.

We've now stumbled upon our next algorithm efficiency paradigm!

Pruning is a technique in search wherein we never expand a node in the search tree that we know will never lead to a solution.

Isn't that cute vocabulary? Pruning a tree? Computer scientists are top(-iary)-notch punsters.

Pruning is an approach that we'll see crop up later in the course as well! For now, however, it has a special application in adversarial search:

The α-β Pruning variant of Mini-Max search stops exploring an action's sub-tree as soon as it is determined that the action is equally or worse scored than some previously explored sub-tree.

This strategy gets its namesake from record-keeping two values at every node:

The two record-keeping values are (unimaginatively):

α, the smallest score from an already-explored action path; updated only on Max nodes when a child's score $v > \alpha $.
β, the largest score from an already-explored action path; updated only on Min nodes when a child's score $v < \beta $.

Important components of the algorithm:

Parents' values for $ \alpha $ and $ \beta $ are the initial state of these variables for each child.
A node (and any resulting sub-trees) is pruned (i.e., not considered in the search) whenever $\beta \le \alpha $

The above gives us a "gist" understanding for the algorithm, but it helps to see it in all of its detail.

Wikipedia lists a nice pseudocode of the alpha-beta pruning algorithm, repeated here for your consumption. In particular, note the meaning of these variables:

  function alphabeta(node, α, β, maximizingPlayer)
    if node is a terminal node
      return the utility score of node
    if maximizingPlayer
      v := -∞
      for each child of node
        v := max(v, alphabeta(child, α, β, FALSE))
        α := max(α, v)
        if β ≤ α
          break;
      return v
    else
      v := ∞
      for each child of node
        v := min(v, alphabeta(child, α, β, TRUE))
        β := min(β, v)
        if β ≤ α
          break;
      return v

Additionally, assuming we're starting with the maximizing player's turn, we would start the ball rolling with the call:

  alphabeta(root, -∞, ∞, TRUE)

It helps to see this in practice and even test your own skills:

The following site has great interactive α-β pruning for you to practice on!

Practice!

Try to fill in the blanks in several examples of the above.

Apply α-β pruning to our original Nim $N = 5$ problem! (solution below)

Look at all that pruning! It's like grandma's fridge there are so many prunes!

Asymptotic Performance

We'll spend a brief minute discussing the computational merits of α-β Pruning... in particular: just how effective is it?

Consider again our tree metrics of $b$, the tree's branching factor, and $d$, the depth of the game tree (deepest terminal node).

In terms of $b, d$ What is the worst case performance for α-β Pruning; characterize what happens to achieve this worst case?

No different than depth- / breadth-first search, $O(b^d)$; happens whenever each player's best moves at a level are considered last in each level, and so no pruning occurs.

Now, given that the worst case is when we discover each players' best moves last...

In terms of $b, d$ What is the best case performance for α-β Pruning; characterize what happens to achieve this best case?

In this case, each players' best moves are considered first, and so we need not consider a second-best move (or third-best, etc.) of any player. This is tantamount to needing to explore roughly half the leaf nodes, giving us $O(b^{d/2}) = O(\sqrt{b^d})$

This might not sound like a huge saving, but consider the metrics of a Mini-Max game tree in Chess:

Fun tidbit: there are approximately $10^{120}$ possible game variations of chess; the complete game tree to find optimal moves would take up about $10^{43}$ move considerations. For comparison, there are about $10^{78}$ atoms in the known universe.

So, this yields an interesting question: plainly we can't reproduce the whole game tree for games like Chess. Is there any way we can apply Mini-Max to these larger state spaces?

Imperfect Real-Time Heuristics

For large state spaces, we can still apply Mini-Max and α-β pruning... if we're fine with having an approximately optimal solution.

Imperfect Real-Time Heuristics, rather than "bubbling up" a utility score for each terminal state, will instead provide a heuristic score for all leaves of the game tree after some pre-determined cut-off depth, $m$.

This might look like setting a cut-off depth of 5 levels from the current Chess board state, and then scoring the leaf states by some heuristic.

What might be a simple heuristic score we could return for a configuration of a Chess board?

And that's all we have to say about Adversarial Search! Next time, we'll start to look at automated reasoning and inference engines 0_o