Data Structures & Algorithms: Path finding II
By: abionnnn
14/1/2005

1. Intermission
===============
In the Last episode of DSA, we looked at two pathfinding algorithms DFS and BFS, along with queues and stacks. Now we do the fun part: actually implementing them in QB. The implementation given below is certainly not the only possible one. In fact, it's extremely inefficient in terms of memory usage! But it is short and simple enough to be understandable.

Throughout the code, I will try to give tips for those wanting to use these ideas in their implementation.

2. Map definition
=================
This part is easy, but a simple trick which trades memory for speed (and code clarity) should be noted. Instead of testing if we go beyond the map boundaries, we can just create an impassible border. This doesn't cost too much memory, but if you prefer not to do this you will need to test if a surrounding square is beyond the array's boundaries.

Going straight for the kill:

READ MapX%, MapY%
READ StrX%, StrY%, EndX%, EndY%
DIM Map%(MapX%, MapY%)

FOR I% = 1 TO MapY%
  FOR z% = 1 TO MapX%
    READ Map%(z%, I%)
  NEXT z%
NEXT I%

Now for the map, we set 0 to be an impassible square, while 1 is a passable square. This choice is of course arbitrary and realistically any descriptive map could be used if you are careful later on to test for the correct possibility criterion.

An Example map:

'Map X, Map Y
DATA 10,10
'Map StartX, StartY, EndX, EndY
DATA 2,2,9,9
DATA 0,0,0,0,0,0,0,0,0,0
DATA 0,1,1,1,1,1,1,0,1,0
DATA 0,1,0,0,0,1,1,0,1,0
DATA 0,1,1,1,1,1,1,1,1,0
DATA 0,0,1,0,1,0,1,0,1,0
DATA 0,0,1,0,1,1,1,0,1,0
DATA 0,1,1,0,0,1,1,1,1,0
DATA 0,1,0,1,0,0,1,0,0,0
DATA 0,1,1,1,1,0,1,1,1,0
DATA 0,0,0,0,0,0,0,0,0,0

3. Depth First Search Implementation
====================================
The DFS implementation is marginally easier to understand and shorter than BFS. To save you the trouble of turning back to the original article, here is what we need for the DFS algorithm:

* A stack
* An array of already visited locations

The stack can hold both x and y coordinates, alternatively it can hold directions that we choose but the later is slightly more involved to implement. If you want to half your memory usage for the stack though, this is definitely a route to take:

'Allocate the stack
DIM X%(MapX% * MapY%), Y%(MapX% * MapY%)

There is nothing special about the already visited location array:

DIM Visited%(MapX%, MapY%)

Now that we have our ingredients, lets start baking! Errr ... lets start making! Taking this step by step:

(0) Initialisation: Empty stack, empty already visited locations.

To empty the stack, simply set the stack pointer to a value beyond the array's range (remember we invoked OPTION BASE 1, making the lower bound value 1)

SP% = 0

This is a unique state we can test. If SP% = 0, then we know the stack is empty. When we push something to the stack, we add one to SP% then add it to the array. When we pop something off the stack, we remove X%(SP%), Y%(SP%) then decrement SP% by one. This way, when we remove the last element we will end up with SP% = 0. Thus we have passed the self-consistency condition. ;)

We will not need to empty the already-visited-locations (Visited%) in this implementation since we will only use these data structures once. If you are going to do this repeatedly, you will need to give this part some thought because for large maps it may take nearly as long as the actual path finding part of the algorithm!

(1) Push the start location on to the stack, mark it as visited. (This is the current location.)

Self-Explanatory:

'Push the initial position on the stack
SP% = SP% + 1
X%(SP%) = StrX%: Y%(SP%) = StrY%
Visited%(StrX%, StrY%) = 1

(2) Check if the stack is empty. If so, terminate failing to find the goal.

We will need to start a loop now, since all steps that go to other steps come back to step 2:

DO WHILE SP% <> 0

Later on we can test if this condition was met:

(...)
LOOP
IF SP% = 0 THEN (display failure message)

(3) Check if current location is the goal. If so, terminate successfully.

We will first need the current location, such that when we first enter the loop the current location is the start location:

   'Temporary reference current position
   CurX% = X%(SP%): CurY% = Y%(SP%)

The test for the end location is quite simple:

   'Are we at the end?
   IF CurX% = EndX% AND CurY% = EndY% THEN EXIT DO

If we exit the DO loop this way, we'll be sure that SP% will not be zero, thus we will not display the failure message and instead plot the path!

(4) Check if there are no reachable positions from current location. If so, pop the current location off the stack, then goto step 2.
(5) For the current location, find an adjacent location which is reachable and push it onto the stack, marking it as visited.
(6) Goto step 2.

These three steps can be combined nicely in QB with a IF, ELSEIF, LOOP statement. The ordering of the choices is arbitrary, but changing it will definitely change the path you will take:

  'Try going right
  IF Map%(CurX% + 1, CurY%) AND NOT Visited%(CurX% + 1, CurY%) THEN
    SP% = SP% + 1
    X%(SP%) = CurX% + 1: Y%(SP%) = CurY%
    Visited%(CurX% + 1, CurY%) = 1
  'Try going left
  ELSEIF Map%(CurX% - 1, CurY%) AND NOT Visited%(CurX% - 1, CurY%) THEN
    SP% = SP% + 1
    X%(SP%) = CurX% - 1: Y%(SP%) = CurY%
    Visited%(CurX% - 1, CurY%) = 1
  'Try going down
  ELSEIF Map%(CurX%, CurY% + 1) AND NOT Visited%(CurX%, CurY% + 1) THEN
    SP% = SP% + 1
    X%(SP%) = CurX%: Y%(SP%) = CurY% + 1
    Visited%(CurX%, CurY% + 1) = 1
  'Try going up
  ELSEIF Map%(CurX%, CurY% - 1) AND NOT Visited%(CurX%, CurY% - 1) THEN
    SP% = SP% + 1
    X%(SP%) = CurX%: Y%(SP%) = CurY% - 1
    Visited%(CurX%, CurY% - 1) = 1
  'Dead end
  ELSE
    SP% = SP% - 1
  END IF
LOOP

-------------------------------Distraction--------------------------------------
A smart way (at least for wide open maps) to choose which way to test first would probably be which direction would get you closer to the goal. This is called a "heuristic", since we're implying that one direction is better than another. This is not Always true. Think of this map for example:

XXXXXXXXXXXXXXX
XS        XXXFX
X XXXXXXXXX X X
X   X   X     X
X X   X   X   X
XXXXXXXXXXXXXXX

(S = Start, F = Finish, X = Wall)

Using a distance heuristic (the difference between your current and the goal's coordinates), implies that the smart way to go would be to the right from the start. This is obviously not true.
------------------------End         of      distraction-------------------------

Now say that we have found the goal. So what? Where's the path that we're interested in. Well if you've been paying attention you would note that it is simply in the stack. X%() and Y%() hold the path from index 1 to index SP%. We can then take this path and do as we wish with it. In my case I decided to plot it out, along with the map!

Download the full example program for DFS here: (INSERT LINK TO PATHDFS.BAS)

4. Breadth First Search Implementation
======================================

Rather than a stack and a bit-set (Visited%), the BFS algorithm requires the following:

* A queue
* An array of already visited locations.
* An array containing the "parent" locations, i.e. the location which added this location. (explained in algorithm)

The queue can be used in an array. Usually one would make the queue wrap-around, but this will not be done for simplicity. When you decide to make serious use of this algorithm, give this part some careful thought! By not wrapping around you will need to allocate enough space to hold the entire map!

To wrap around, you would basically need to MOD all add/remove operations with the maximum queue length, checking for overflows at the same time.

'Allocate the queue, big enough to avoid wrap-around problem.
DIM X%(MapX% * MapY%), Y%(MapX% * MapY%)

We will also need a Front-of-Queue Pointer, and a Queue-Length Pointer. The latter is not what it's name implies, it just tells us which index of the array represents the end of the Queue.

As mentioned in the first article, we can combine the last two ingredients as follows:

'Here Parent means the following:
'-1 : No parent. (Start location)
' 0 : No parent yet! (UNVISITED)
' 1 : Parent to the left
' 2 : Parent to the right
' 3 : Parent up
' 4 : Parent down
DIM Parent%(MapX%, MapY%)

Thus, Parent% acts both as the array of already visited locations and the array containing the parent locations.

Cutting straight to the algorithm:

(0) Initialisation: Empty queue, empty already visited locations.

QP% = 0 'Queue Pointer
QL% = 0 'Queue Length

When adding to a queue, we first increment the Queue Length and then add the coordinates to X%(QL%), X%(QL%). This is different from the stack case. When we remove from the queue, we increment QP% then read off X%(QP%), X%(QP%).

(1) Push the start location onto the queue, mark it as visited, but set it's parent location to some special value indicating that it is the start. The initial location is the current location.

'Add the initial position to the queue
'The the start has no parent. (0 means not visited, -1 means parentless)
QL% = QL% + 1
X%(QL%) = StrX%: Y%(QL%) = StrY%
Parent%(StrX%, StrY%) = -1

(2) Check if queue is empty. If so, terminate failing to find the goal.

The only jump done in the algorithm jumps to 2, so we will again use a DO loop:

DO WHILE QP% < QL%

When QP% = QL% we would have reached the end of the Queue. (since the next removal from the queue would be out of bounds!)

Testing for failures in this case is not as straight forward as before since the very last point on the queue could be the goal! But then that's all there is to it. Some time after the loop we have this:

(...)
LOOP
IF QP% = QL% AND (X%(QL%) <> EndX% OR Y%(QL%) <> EndY%) THEN (Display failure message)

(5) Remove the current location from the front of the queue, set current location to the item now at the front of the queue.
(3) Check if current location is the goal. If so, terminate successfully.

We will implement the loop by going from the second last step to the first loop step. This is alright if you think about it, and we only need to do this due to the definition we choose for QP% (i.e. it starts at 0 when the loop begins). Sounds complicated ... nah. Just look at the simple code below =) :

'Temporary reference current position
QP% = QP% + 1
CurX% = X%(QP%): CurY% = Y%(QP%)
 
'Are we at the end?
IF CurX% = EndX% AND CurY% = EndY% THEN EXIT DO

(4) Add each reachable location to the end of the queue, marking them as visited. For each of these added locations, make their parent location the current location.
(6) Goto step 2

This part will look surprisingly similar to the DFS one! But note that no ELSEIF's are used, since we will add ALL valid adjacent squares to the queue.

  'Try going right
  IF Map%(CurX% + 1, CurY%) AND Parent%(CurX% + 1, CurY%) = 0 THEN
    QL% = QL% + 1
    X%(QL%) = CurX% + 1: Y%(QL%) = CurY%
    Parent%(CurX% + 1, CurY%) = 1
  END IF

  'Try going left
  IF Map%(CurX% - 1, CurY%) AND Parent%(CurX% - 1, CurY%) = 0 THEN
    QL% = QL% + 1
    X%(QL%) = CurX% - 1: Y%(QL%) = CurY%
    Parent%(CurX% - 1, CurY%) = 2
  END IF

  'Try going down
  IF Map%(CurX%, CurY% + 1) AND Parent%(CurX%, CurY% + 1) = 0 THEN
    QL% = QL% + 1
    X%(QL%) = CurX%: Y%(QL%) = CurY% + 1
    Parent%(CurX%, CurY% + 1) = 3
  END IF

  'Try going up
  IF Map%(CurX%, CurY% - 1) AND Parent%(CurX%, CurY% - 1) = 0 THEN
    QL% = QL% + 1
    X%(QL%) = CurX%: Y%(QL%) = CurY% - 1
    Parent%(CurX%, CurY% - 1) = 4
  END IF
LOOP

That gives us the goal, but how do we pry out the path??? Well it's all hidden in the Parent% array. Notice that with each turn we have noted the direction which would lead to the parent square, thus we can imply how to return from the end to the beginning! Reversing the array will give us the direction from the beginning to the end which is exactly what we are after! We can use the same space for the queue to do the backtrace part, since we don't need the queue anymore.

'Backtrace, here we start at the end and then make it back to the beginning
'Since we know which move caused the arrival to each cell we can just
'take the reverse direction.
CurX% = EndX%: CurY% = EndY%

'We can make use of the space of X%, Y% now
i% = 0
DO
  i% = i% + 1
  X%(i%) = CurX%: Y%(i%) = CurY%
  
  'Remember our definitions from before here.
  SELECT CASE Parent%(CurX%, CurY%)
    CASE 1
      CurX% = CurX% - 1
    CASE 2
      CurX% = CurX% + 1
    CASE 3
      CurY% = CurY% - 1
    CASE 4
      CurY% = CurY% + 1
    CASE -1
      EXIT DO
    CASE ELSE
      EXIT DO
  END SELECT
LOOP

In the example code, no reversal is done (it's trivial) since it will be plotted on the screen instantly. (unless you are running it on your old 286 without the turbo button! Nah, I tried it on mine and it was still too fast ;))

Download the full example program for BFS here: (INSERT LINK TO PATHBFS.BAS)

5. Conclusion II
================
BFS/DFS are important to understand before moving on to better (or at least more practical) algorithms. But they are lacking in the performance department. For example, if you are making a strategy game you should seriously consider the A* algorithm instead. To be able to code it with some performance gain, you will need to use a priority queue. (e.g. a binary heap)

Well that's it for now. I will be pretty busy with University work in the months to come, but if there is  demand this series will return next time with more data structures, and maybe even an algorithm or two! (hopefully A*)

-abionnnn
The author can be contacted at moc.liamg@nnnnoiba (reverse it ;))

