QB CULT MAGAZINE
Vol. 3 Iss. 3 - December 2003

Chapter 12 - ASSEMBLY LANGUAGE PROGRAMMING

By Ethan Winer <http://www.ethanwiner.com>

This book has consistently presented programming techniques that reduce the size of your programs, and make them run faster. Most of the discussions focused on ways to write efficient BASIC code, and several showed how to access system interrupt services. Where speed was critical or BASIC was inflexible, I presented subroutines written in assembly language.

Assembly language is the most powerful way to communicate with a PC, and it offers speed and flexibility unmatched by any other language. Indeed, assembly language is in many ways the ultimate programming language because it lets you control fully every aspect of your PC's operation. Anything that a PC is capable of doing can be accomplished using assembly language. This final chapter explains assembly language in terms that most BASIC programmers can understand.

Why, you might ask, would a BASIC programmer be interested in assembly language? After all, the whole point of a high-level language such as BASIC is to shield the programmer from the underlying hardware. Without having to worry about CPU registers and memory addresses, a BASIC programmer can be immediately productive, and probably write programs with fewer initial bugs. However, there are three important reasons for using assembly language:

It is important to understand that any high-level language will benefit from the appropriate use of assembler. And while it is possible to write a major application using only assembly language, the increased complexity and added time to develop and debug it are often not worth the trouble. Using a high-level language--especially BASIC--for the majority of a program and then coding the size- and speed-critical portions in assembly language often is the most practical solution.

Many BASIC programmers mistakenly believe that to achieve the fastest and smallest programs they should learn C. In my opinion, nothing could be further from the truth. Assembly language is barely more difficult to use than C, and in fact the code is often more readable. Further, no high-level language can come even close to what raw 8086 code can achieve. If you truly desire to become an advanced programmer, you owe it to yourself to at least see what assembly language is all about. I believe there is no deeper satisfaction than that gained by understanding fully what your computer is doing at the lowest level.

This chapter assumes that you already understand basic programming concepts such as variables, arrays, and subroutines. As we proceed, most of the examples will provide parallels to BASIC where possible. But please remember one important point: There is nothing inherently difficult about assembly language. Attitude is everything, and if you can think of assembler as a stripped-down version of BASIC, you will be successful that much sooner.

For ease of reading, I will refer to the 8088 microprocessor used in the IBM PC throughout this chapter. However, everything said about the 8088 also applies to the 8086, the 80286, the 80386/486, and the NEC V series found in some older PC compatible computers. I will also use the terms assembly language and assembler interchangeably, although assembler can also be used to mean the program that assembles your source files.

All of the examples in this chapter are meant to be assembled with the Microsoft Macro Assembler (MASM) version 5.1 or later. MASM requires that you save your source files as standard ASCII text, and most word processor programs can do this.

Some of the examples in this chapter are derived from those that used CALL Interrupt in Chapter 11. In most cases I have not bothered to restate the same information from that chapter, and you may want to refer back for additional information.

Finally, many entire books have been written about assembly language, and there is no way I can possibly teach you everything you need to know here. Rather, my intent is to provide a gentle introduction to the concepts using practical and useful examples.

As easy as BASIC

Assembly language uses the same general form as a BASIC program. That is, commands are performed in sequence until a GOTO or GOSUB is encountered. In assembly language these are called Jump and Call, respectively. Many BASIC instructions have a direct assembler equivalent, although the syntax is slightly different. One important difference, however, is that the 8088 microprocessor can operate on integer numbers only. Another is that for the most efficiency, you are limited to only a few working variables. I will begin by showing some rudimentary assembly language instructions, so you can see how they are analogous to similar commands in BASIC. Consider the following BASIC program fragment:

   AX = 5

Here, the value 5 is assigned to the variable AX. The 8088 has several built-in variables called *registers*, and one of them is called AX. To move the value 5 into the AX register you use the Mov instruction:

   Mov AX,5

As with BASIC, the destination variable in an assembly language program is always shown on the left, and the source is on the right. Now consider addition and subtraction. To add the value 12 to AX in BASIC you do this:

   AX = AX + 12

The equivalent 8088 command is:

   Add AX,12

Again, the variable or register on the left is always the one that receives the results of any adding, moving, and so on. Subtraction is very similar to addition, replacing Add with Sub:

       BASIC:  AX = AX - 100
   Assembler:  Sub AX,100

Comparing and branching in assembly language is also quite similar to BASIC. But instead of this:

   AX = AX + 2
   IF AX > 60 GOTO Finished

You'd do it in assembler this way:

   Add AX,2
   Cmp AX,60
   Ja  Finished

This tells the 8088 to add 2 to AX, then compare AX to 60, and finally to *jump if above* to the code at label Finished. There are several kinds of conditional jump instructions in assembly language, and they often follow a comparison as shown here. In fact, all you can really do after a compare is jump somewhere based on the results. And while there is no direct equivalent for this BASIC statement:

   IF AX = 10 THEN BX = BX - 1

You can change the strategy to this:

   IF AX <> 10 GOTO Not10
   BX = BX - 1
   Not10:
    .
    .

Now a direct translation is simple:

   Cmp AX,10
   Jne Not10
   Dec BX
   Not10:
    .
    .

Jne stands for *Jump if Not Equal*. Also, notice the command Dec, which means decrement by 1. This is one case in which an assembler instruction is actually more to the point than its BASIC counterpart, and is equivalent to the BASIC command BX = BX - 1. While Sub BX, 1 would work just as well, using Dec is faster and generates less code, and we all know that speed is the name of the game.

The complement to Dec is Inc, short for *increment by one*. You can use Inc and Dec with most of the 8088's registers, as well as on the contents of any memory location, which brings up an important issue. At some point, many programs will require more variables than can be held within the CPU's registers. All of the available free memory in a PC can be used as variable storage, with only a few limitations:

Besides the CPU registers and conventional memory addresses, a special portion of memory called the *stack* is also available for storage. The stack is much like the temporary memory on a four-function calculator, and it is often used to store intermediate results. The stack is also commonly used to pass variables between programs, because all programs can access it without having to know exactly where in memory it is located. Again, assembly language doesn't usually require you to deal with absolute memory addresses at all--especially for subroutines that will be added to a BASIC program. The only exceptions might be when writing directly to the display screen, or when looking at low memory, perhaps to see whether the Caps Lock key is engaged.

Spaghetti Code?

To write a routine that converts lower case letters to capital letters in BASIC, you might use something like this:

   IF AL$ => "a" AND AL$ <= "z" THEN
     AL$ = CHR$(ASC(AL$) - 32)
   END IF

In assembly language each compare must be done separately, followed by a jump based on the results. Let's rephrase the BASIC example slightly:

   IF AL$ < "a" GOTO Done
   IF AL$ > "z" GOTO Done
   AL$ = CHR$(ASC(AL$) - 32)
   Done:
    .
    .

Now a conversion to assembler is easy:

   Cmp AL,"a"     ;compare AL to "a"
   Jb  Done       ;Jump if Below to Done
   Cmp AL,"z"     ;compare AL to "z"
   Ja  Done       ;Jump if Above to Done
   Sub AL,32      ;subtract 32 from AL
   Done:
    .
    .

Notice how the assembler allows the use of quoted constants. When it sees a character or string in double or single quotes, it knows you mean to use the character's ASCII value. Unlike BASIC with its strong variable typing that prevents you from performing numeric operations on a string, assembly language has very few such restrictions. Also notice how much jumping around is necessary to accomplish even the simplest of actions.

As I mentioned earlier, assembly language can certainly be more tedious than BASIC, although the logic is not really that different. Such frequent jumping around is called spaghetti code by some programmers, and it is often used in a derogatory fashion when discussing BASIC's GOTO statement. But this is the way that computers work, and I am amused by programmers who argue so strongly against all use of the GOTO command. While nobody could seriously object to a well organized and structured programming style, all programs are eventually converted to equivalent assembly language jumps and branches.

The Registers

There are six general purpose registers available for you to use: AX, BX, CX, DX, SI, and DI. Each register may be used for the most common operations like adding and subtracting, although some are specialized for certain other operations. However, most of the registers also have a specialty. For example, AX is the only register that can be multiplied or divided. The A in AX stands for Accumulator, and it often used for math operations such as accumulating a running total. Also, several assembler instructions result in one byte less code when used with AX, when compared to the same instructions using other registers.

The B in BX means Base, and this register is frequently used to hold the base address of a collection of variables or other data. If you have a text string in memory to be examined, you could put the address of the first character in BX. The rest of the string can then be found by referencing BX.

BX can also be used to specify computed addresses using addition or subtraction. For example, the instruction Mov AX,[BX+4] means to load AX with the word four bytes beyond the address held in BX. Likewise, the instruction Add DL,[BX+SI-10] adds the value of the byte at that computed address to the current contents of DL. You may use BX this way with either a constant number, the SI or DI register, or one of those registers and a constant number. However, only addition and substraction may be used, as opposed to multiplication or division. I will return to computed and indirect addressing later in this chapter.

The C in CX stands for Count, since CX is most often used as the counter in an assembly language FOR/NEXT loop. In fact, the assembly language command Loop uses CX to perform an operation a specified number of times. The comparison below illustrates this.

   BASIC:
        FOR CX = 1 TO 5
          GOSUB BeepTone
        NEXT

   Assembler:
        Mov  CX,5
        Do:  Call Beep_Tone
        Loop Do

Here, the Loop instruction automatically branches to the label Do: CX times. That is much faster and more efficient than this:

   Mov  CX,5
   Do:  Call Beep_Tone
   Dec  CX
   Cmp  CX,0
   Jne  Do

The DX register is a general purpose Data register, and is named accordingly. DX is also used in conjunction with AX when multiplying and dividing.

The last two general purpose registers are SI and DI. SI stands for Source Index, while DI means Destination Index. It is not hard to guess that these registers are well suited for copying data from one memory location to another. The 8088 has a rich set of instructions for moving and comparing strings, using SI and DI to show where they are.

Like BX, SI and DI may be used with a constant offset such as [SI+100] to compute a memory address, or with a constant value and/or BX. But again, SI and DI are still general purpose registers, and they can be used for common chores as well. In many situations it really doesn't matter whether you use BX or DI or SI or AX.

There are two specialized registers called BP and SP. BP (Base Pointer) is another Base register like BX, only it is intended for use with the stack. When you need to access data on the stack, BP is the most appropriate register to use. Like BX, BP can reference computed addresses with a constant offset, with SI or DI, or with a constant and SI or DI. The SP (Stack Pointer) register holds the current address of the stack, and it should never be altered unless you have a very good reason to do so. The last four registers are the segment registers, but I will mention them only briefly right now. As you undoubtedly know, the 8088 used a segmented architecture; although it can utilize a megabyte of memory, it can do so only in 64K portions at a time. The CS register holds the current Code Segment (your program code), DS holds the Data Segment (your memory variables), SS holds the Stack Segment, and ES is an Extra Segment that is often used to access arrays located in far memory.

Each of the 8088 registers can hold one word (two bytes), allowing you to store any integer number between 0 and 65535. This range of values can also be considered as -32768 to 32767. But AX, BX, CX, and DX may also be used as two separate one-byte registers with a range of either 0 to 255 or -128 to 127. One byte is often sufficient--for example, when manipulating ASCII characters--and this ability to access each half individually effectively adds four more registers. Remember, the more variables you can keep within registers, the faster and more efficient a program will be. When using the registers separately, the two halves are identified by the letters H and L, for High and Low. That is, the high portion of AX is referred to as AH, while the low portion of DX is called DL. This would be represented with BASIC variables as follows:

   AX = AL + 256 * AH

Each half can also be represented as bit patterns:

              AX
   ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
    1011  0110  0111  0101
   ÀÄÄÄÄÄÄÄÄÄÄÙÀÄÄÄÄÄÄÄÄÄÄÙ
        AH          AL

Notice that SI, DI, BP, and SP cannot be split this way, nor can the segment registers CS, DS, SS, and ES.

There is also another register called the Flags register, though it is not intended for you to use directly. After performing calculations and comparisons, certain bits in the Flags register are set or cleared by the CPU automatically, depending on the results. For example, if you add a register that holds the value 40000 to another register whose value is 30000, the Carry flag will be set to show that the result exceeded 64K. The 8088 flags are also set or cleared to reflect the result of a Cmp (Compare) instruction. Although you will not usually access these flags directly, they are used internally to process Jne, Ja, and the other conditional jump commands.

Variables in Assembly Language

All of the example routines shown so far have used the 8088 registers as working variables. Indeed, using registers whenever possible is always desirable because they can be accessed very quickly. But in many real- world applications, more variables are needed than can fit into the few available registers. As with BASIC, MASM lets you define variables using names you choose, and you must also specify the size of each variable. The first step is to define the amount of space that will be set aside with the assembler instructions DB and DW. These stand for Define Byte and Define Word respectively, and they allocate either one byte of storage or two. You can also use DD to define a double word long integer variable. Notice that these are not commands that the 8088 processor will execute; rather, they inform the assembler to leave room for the data. Some examples are shown below:

   MyByte DB 12h                     ;one byte, preset to 12h
   Buffer DB 15 Dup(0)               ;fifteen bytes, all 0
   Dummy  DW ?                       ;one word (two bytes), 0
   Msg    DB "Test message",13,10    ;message, CR, LF

In the first example one byte of memory is allocated using the name MyByte, and the value 12 Hex is placed there at assembly time. The second example illustrates using the Dup (duplicate) command, and tells MASM to set aside fifteen bytes filling each with the specified value. In this case that value is zero. Initialized data is an important feature of assembly language, and one that is sorely missing from BASIC. By being able to allocate data values at assembly time, additional code to assign those values at runtime is not needed.

Filling an area with zeroes can also be accomplished with a question mark, and this is frequently used when the value that will eventually end up there is not known in advance. Both do the same thing in most cases, however using "?" implies an unknown, as opposed to an explicit zero. You may use whichever method seems more appropriate at the time. The last example shows how text may be specified, as well as combining values in a single statement.

Since the assembler lets you use names for your data, fetching or storing values can be done with the normal Mov instruction like this.

   Error_Code  DB ?
   Mov Error_Code,AL

This puts the contents of register AL into memory location Error_Code. Getting it back again later is just as easy:

   Mov DH,Error_Code

Sometimes the assembler needs a little help when you assign variables. When you move AL or DH in and out of a memory location, the assembler knows that you are dealing with a single byte. And if you specify BX or SI as the source or destination operand, the assembler understands this to mean two bytes, or one word. But when literal numbers are used, the size of the value is not always obvious. Consider the following:

   Mov [BX],3Ch

Does this mean that you want to put the value 3Ch into the byte at the address held in BX, or the value 003Ch into the *word* at that address? There is no way for MASM to know what your intentions are, so you must specify the size explicitly. This is done with the Byte Ptr and Word Ptr directives. Here, Ptr stands for Pointer, and two examples are shown:

   Mov Byte Ptr [BX],15
   Mov Word Ptr ES:[DI],100

The first example specifies that the memory at address BX is to be treated as a single byte. Had Word been used instead, a 15 would be placed into the byte at address held in BX, and a zero would be put into the byte immediately following. Words are always stored with the low-byte before the high-byte in memory.

Memory variables are accessed using the normal complement of instructions. For example, to add 15 to the variable Counter you will use Add Counter,15. And to multiply AX by the word variable Number you will use Mul Word Ptr Number. In MASM versions 5.0 and later, the Word Ptr argument is not strictly necessary. That is, if Number had been defined using DW, then MASM knows that you mean to multiply by a word rather than a byte. But earlier versions of the assembler were not so smart, and an explicit Word Ptr or Byte Ptr was required.

Note, however, that you must still use Byte Ptr or Word Ptr to override a variable's type. For example, if Value was defined as a word but you want to access just its lower byte, you must use Mov AL,Byte Ptr Value. Here, stating Byte Ptr explicitly tells MASM that you are intentionally treating Value as a different data type. Otherwise, it will issue a non- fatal warning error message.

Sometimes you may want to refer to the address of a variable, as opposed to its contents. For example, Mov AX,Variable tells MASM to move the value held in Variable into the AX register. But many DOS services require that you specify a variable's address in a register. This is done using the Offset operator: Mov DX,Offset Buffer. Where Mov DX,Buffer places the first two bytes of the buffer into DX, using Offset tells MASM that you instead want the starting address of the buffer.

You can also use the Lea (Load Effective Address) command to obtain an address, but that is less frequently used. Although Lea DX,Buffer can be used to load DX with the starting address of Buffer, it is a slightly slower instruction. Lea is needed only when an address must be computed. For example, the instruction Lea SI,[BX+DI] loads SI with the sum of the BX and DI registers. You may notice that Lea can provide a shortcut for adding or subtracting certain register combinations. Although this use of Lea is uncommon, Lea can replace the following two instructions:

   Mov SI,BX
   Add SI,DI

To subtract two registers or a register and a constant value you could use Lea AX,[BX-DI] or Lea SI,[BP-10].

Calculations in Assembly Language

When adding or subtracting you may use two registers, or a register and a memory variable. It is not legal to specify two memory variables as in Add Var1,Var2.

Multiplying and dividing are not so flexible; only AL and AX may be multiplied. When dividing, the numerator must be either in AX, or the long integer comprised of DX:AX. In this case, DX holds the upper word and AX holds the lower one. However, you may multiply or divide these registers using either a register or a memory location. Because of this restriction, it is not necessary to specify the target operand size. That is, Mul CL means to multiply AL by CL leaving the result in AX, and Div WordVariable divides DX:AX by the contents of WordVariable leaving the result in AX and the remainder in DX. Although you could use the commands Mul AL,CL and Div AX,WordVariable, this is not necessary or common.

All of the allowable combinations for multiplying and dividing are shown in Figure 12-1.

Instruction          Operand    Result    Remainder
ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ     ÍÍÍÍÍÍÍ    ÍÍÍÍÍÍ    ÍÍÍÍÍÍÍÍÍ
Mul ByteRegister        AL        AX         n/a
Mul ByteVariable        AL        AX         n/a
Mul WordRegister        AX       DX:AX       n/a
Mul WordVariable        AX       DX:AX       n/a

Div ByteRegister        AX        AL          AH
Div ByteVariable        AX        AL          AH
Div WordRegister      DX:AX       AX          DX
Div WordVariable      DX:AX       AX          DX
Figure 12-1: The allowable register/memory combinations for multiplying and dividing.

In Figure 12-1 ByteRegister means any byte-sized register such as AL or CH; WordRegister indicates any word-sized register like CX or BP. Likewise, ByteVariable and WordVariable specify byte- and word-sized integer memory variables respectively.

It's important to understand that you must never divide by zero, because that will generate a critical error. Because the result from dividing by zero is infinity, the 8088 has no way to handle that--it can't simply ignore the error. Therefore, dividing by zero causes the CPU to generate an Interrupt 0. In a BASIC program that error is routed to BASIC's internal error handling mechanism which either invokes the ON ERROR handler if one is in effect, or ends your program with an error message. In a purely assembly language program, DOS intervenes printing an error message on the screen, and then it ends the program.

Related to division by zero is dividing when the result cannot fit into the destination register. For example, if AX holds the value 20000 and you divide it by 2, the resulting 10000 cannot fit into AL. Since this is another unrecoverable error that cannot be ignored, the 8088 generates an Interrupt 0 there as well.

Besides the Div and Mul instructions, there are also signed versions called Idiv and Imul. Where Div and Mul treat the contents of AX or DX:AX as an unsigned value, Idiv and Imul treat them as being signed. You'll use whichever command is appropriate, so the 8088 knows if values having their highest bit set are to be treated as negative. BASIC always uses Idiv and Imul in the code it generates, since all integer and long integer values are treated by BASIC as signed.

Because only AX and DX:AX may be used for multiplying and dividing, this affects your choice of registers. The short example that follows shows how you might select registers when translating a simple BASIC-like expression that uses only integer (not long integer) variables.

   BASIC:
        Result = (Var1 + Var2 * (Var3 - Var4)) \ 100


   Assembler:
        Mov  AX,Var3          ;work from the innermost level out
        Sub  AX,Var4          ;so first perform Var3 - Var4
        Imul Word Ptr Var2    ;then multiply that by Var2
        Add  AX,Var1          ;add Var1 to what we have so far
        Mov  DX,0             ;next prepare to divide DX:AX
        Mov  CX,100           ;use CX for the divisor
        Idiv CX               ;do the division
        Mov  Result,AX        ;then assign Result ignoring the
                              ;  remainder left in DX

Because dividing by an integer value uses both DX and AX, it is necessary to clear DX explicitly as shown unless you are certain it is already zero. The use of CX to hold the value 100 is arbitrary. If CX were currently in use, any available word-sized register or memory location could be used. If you compile this program statement and view the resultant code using CodeView, you will see that BASIC does an even better job of translating this particular expression to assembly language.

String Processing Instructions

Besides being able to add, subtract, multiply, and divide, the 8088 provides four very efficient instructions for manipulating strings and other data in memory. Movs copies, or moves a string from place to another; Cmps compares two ranges of memory; Stos fills, or stores one or more addresses with the same value; and Scas scans a range of memory looking for a particular value. These instructions require either a byte or word specifier. For example, you would use Movsb to copy a byte, and Cmpsw to compare two words.

There are two important factors that contribute to the power and usefulness of these string instructions: each is only one byte long, and they automatically increment or decrement the SI and DI registers that point to the data being manipulated. Thus, they are both convenient to use, and also very fast. Because it is common to access blocks of memory sequentially a byte or word at a time, automatically advancing SI and DI saves you from having to do that manually with additional instructions. For example, after one pair of words has been compared, SI and DI are already set to point at the next pair.

You can also specify that SI and DI are to be decremented by first using the Std (Set Direction) command. The Direction Flag stores the current string operations direction, which is either up or down. If a previous Std was in effect, then you'd use Cld (Clear Direction) to force copying and moving to be forward. In fact, BASIC *requires* you to clear the direction flag to forward before returning from a routine that set it to backwards.

MOVS and CMPS

Movs and Cmps use the DS:SI register pair to point to the first range of memory being copied or compared, and ES:DI to point to the second range. Each time a byte is being copied or compared, SI and DI are incremented or decremented by one to point to the next address. And when a word is being accessed, SI and DI are incremented or decremented by two.

Notice that there is no protection against SI or DI being incremented or decremented through address zero, nor is there any indication that this has happened. Also notice that the name Movs is somewhat of a misnomer. To me, moving something implies that it is no longer at its original location. Movs does not alter the source data at all--it merely places a new copy at the specified destination address.

SCAS and STOS

Scas compares the value in AL or AX with the range of memory pointed to by ES:DI. That is, Scasb compares AL and Scasw uses AX. Stos also uses ES:DI to show where the data being written to is located; Stosb stores the contents of AL in the address at ES:[DI] and then increments or decrements DI by one. Likewise, Stosw stores the value in AX there and increments or decrements DI by two.

Repeating String Operations

If these four instructions merely acted on the data and incremented SI and DI automatically, that would be very useful indeed. But they also have another talent: they recognize a Rep (Repeat) prefix to perform their magic a specified number of times. The number of iterations is specified by the count held in CX. Furthermore, the number of repetitions can be made conditional when comparing and scanning, based on the data encountered. If you have, say, 20 bytes of data that need to be copied from one place to another, you would first set CX to 20 and then use Rep Movsb. And to compare 100 words you would load CX with the value 100 and use Rep Cmpsw. Stos also accepts a Rep prefix; Rep Stosb places the value in AL into CX bytes of contiguous memory starting at the address specified in ES:DI. For each iteration the 8088 decrements CX, and when it reaches zero the copying or comparing is complete.

It is usually not valuable to scan a range of memory unconditionally and repeatedly. Therefore Scas is generally used in conjunction with either Repe (Repeat while Equal) or Repne (Repeat while Not Equal). Cmps is also generally used with these conditional prefixes, to avoid wasting time comparing bytes after a match or a difference was found. In either case, however, you load CX with the total number of bytes or words being compared or scanned.

Because each iteration decrements CX, you can easily calculate how many bytes or words were actually processed. Also, you can test the results of scanning and comparing using the normal methods such as Je and Jne. The following few examples show some ways these commands can be used.

   See if two 40-byte ranges of memory are the same:

        Mov  CX,20              ;comparing 20 words is faster than 40 bytes
        Repe Cmpsb              ;compare them
        Je   Match              ;they matched

   Copy a 2000-element integer array to color screen memory:

        Mov  AX,ArraySeg        ;set DS to the source segment
        Mov  DS,AX              ;through AX
        Mov  SI,ArrayAdr        ;point SI to the array start
        Mov  AX,&HB800          ;the color text screen segment
        Mov  ES,AX              ;assign that to ES
        Mov  DI,0               ;clear DI to point to address 0
        Mov  CX,2000            ;prepare to copy 2000 words
        Rep  Movsw              ;copy the data

   Search a DOS string looking for a terminating zero byte:

        Mov  AX,StringSeg       ;set ES to the string's segment
        Mov  ES,AX              ;(ES cannot be assigned directly)
        Mov  DI,Offset ZString  ;point DI to the string data
        Mov  CX,80              ;search up to 80 bytes
        Mov  AL,0               ;looking for a zero value
        Repne Scasb             ;while ES:[DI] <> AL
        ;-- Now DI points just past the terminating zero byte.
        ;-- The length of the string is (80 - CX + 1).

In the first example, it is assumed that DS:SI and ES:DI already point to the correct segment and address. By asking to compare only while the bytes are equal, the result of the most recent byte comparison can be tested using Je. A common mistake many programmers make is comparing the bytes, and then checking if CX is zero. The reasoning is that if CX is zero then they must have all matched; otherwise, the 8088 would have aborted the comparisons early. But CX will also be zero if all but the last byte matched! Therefore, you must check the zero flag using Je (or Jne if that is more appropriate).

Notice in the first example how 20 words are compared, rather than 40 bytes. Although the net result is the same, word operations are faster on 80286 and later processors when the blocks of memory begin at an even numbered address. [Though you can't always know if a variable or block of memory will begin at an even address, using the word version will be more efficient at least some of the time.]

The second and third examples include the code needed to set up the appropriate segment and address values in DS:SI and ES:DI. Although this may seem like a lot of work, you can often do this setup only once and then use the same registers repeatedly within a routine. Unfortunately, you are not allowed to assign a segment register from a constant number. You must first assign the number to a conventional register, and then use Mov to copy it to the segment register.

The Stack

The primary purpose of the stack is to retain the return address of a program when a subroutine is called. This is true not only for assembly language, but for BASIC as well. For example, when you use the BASIC statement GOSUB 1200, BASIC must remember the location in memory of the next command to execute when the routine returns. It does this by placing the address of the next instruction onto the stack *before* it jumps to the subroutine. Then when a RETURN instruction is encountered, the address to return to is available. The 8088 understands Calls and Returns directly, and it places and restores the addresses on the stack automatically. The stack is not unlike a stack of books on a table, and one of its great advantages is that you don't need to know where in memory it is actually located. Items can be placed onto the stack either manually with the Push instruction, or automatically by the 8088 processor as part of its handling of Call and Return statements. Values are retrieved from the stack with the Pop command, among other methods.

One important feature of the stack is when items are added and removed, the stack pointer register is updated automatically to reflect the next available stack location. Thus, a program can access items on the stack based on the stack pointer, rather than have to know the exact address at any given time. This simplifies exchanging information between programs, since neither has to know how the other operates. This mechanism also makes it possible for programs written in one language to communicate with subroutines written in another.

Figure 12-2 shows how the stack operates.

           ³
³
           ³
³
ÃÄÄÄÄÄÄÄÄÄÄ´
³  Item 1  ³ <ÄÄ first item that was pushed
ÃÄÄÄÄÄÄÄÄÄÄ´
³  Item 2  ³ <ÄÄ second item that was pushed
ÃÄÄÄÄÄÄÄÄÄÄ´
³  Item 3  ³ <ÄÄ third item that was pushed
ÃÄÄÄÄÄÄÄÄÄÄ´
³  Item 4  ³ <ÄÄ last item that was pushed (SP points here)
ÃÄÄÄÄÄÄÄÄÄÄ´
³   Next   ³ <ÄÄ next available stack location
ÃÄÄÄÄÄÄÄÄÄÄ´
           ³ ÚÄÄ the stack grows downward
³            ³   as new items are added
           ³ ³
³            ³
             \/
Figure 12-2: The organization of the CPU stack.

As each item is pushed onto the stack, it is placed two bytes below the address held in the stack pointer. Then the stack pointer is decremented by two, to show the next available stack location. Therefore, the stack grows downward as new items are added. Note that only full words may be pushed onto the stack, so all of the items shown here are two bytes in size. Also note that the stack pointer holds the address of the last item that was pushed.

Passing Parameters

Imagine you have a BASIC subroutine that does something to the variable X. The code to assign X, process, and print X might look like this:

   X = 12
   GOSUB 2000     'the routine at line 2000 manipulates X
   PRINT X

In assembly language you could push the value 12 onto the stack, and then call the subroutine. The subroutine, expecting the value there would retrieve it, do its work, and then place the result back again before returning. This is similar, but not identical, to how variables are passed between programs. Most high-level languages including BASIC pass variables to subroutines by placing their *addresses* on the stack. A called routine can then access the variable via its address, either to read it or to assign a new value.

If BASIC let you access the registers directly, it could pass variables through them, as you saw when telling DOS which of its services to do. But BASIC doesn't allow that and moreover, with a limited number of registers, only a few variables or addresses could be accommodated. The stack can hold any number of arguments, by pushing the address of each in turn. When you use the BASIC CALL command and pass a variable name to a SUB or FUNCTION procedure, BASIC first pushes the address of that variable onto the stack, before jumping to the code being called. And if more than one variable is specified, all of the addresses are pushed. The example below shows how you might call a routine that returns the current default drive.

   CALL GetDrive(Drive%)

When GetDrive begins, it knows that the stack is holding the address of Drive%. The segment and address of the calling BASIC program is also on the stack; however, GetDrive is not concerned with that. The important point is that it can find the address on the stack using the SP (Stack Pointer) register. When GetDrive begins the stack is set up as shown in Figure 12-3.

           ³ ^
³            ³
           ³ ³
³            ÀÄÄ higher addresses
ÃÄÄÄÄÄÄÄÄÄÄ´
³  Drive%  ³ <ÄÄ the address of Drive% that BASIC pushed
ÃÄÄÄÄÄÄÄÄÄÄ´
³ Ret Seg  ³ <ÄÄ BASIC's segment to return to
ÃÄÄÄÄÄÄÄÄÄÄ´
³ Ret Adr  ³ <ÄÄ BASIC's address to return to (SP holds this address)
ÃÄÄÄÄÄÄÄÄÄÄ´
³   Next   ³ <ÄÄ the next available stack location
ÃÄÄÄÄÄÄÄÄÄÄ´
           ³
³
           ³
³

Figure 12-3: The state of the stack within a procedure when one variable address was passed.

Notice that while GetDrive can get at the address of Drive% through SP, an extra step is still required to get at the *data* held in Drive%. Let's digress for a moment to reconsider the difference between memory addresses and values. The assembler command Mov AX,12 puts the value 12 into register AX. But suppose you want to put the contents of *memory location* 12 into AX. You indicate this to the assembler by using brackets, as shown in the two equivalent examples following.

   Mov AX,[12]    ;load AX from address 12

   Mov BX,12      ;assign BX to the value 12
   Mov AX,[BX]    ;load AX from the address held in BX

The first statement loads AX from the contents of memory at address 12. The second first loads BX with the number 12, and then uses BX to identify that address, moving the contents of that address into AX. This is an important distinction, and is illustrated in Figure 12-4 using parallels to BASIC's PEEK and POKE commands.

     BASIC                      Assembler
ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ       ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ
BP = SP                    Mov BP,SP
AL = PEEK(BP + 8)          Mov AL,[BP+8]
SI = 12                    Mov SI,12
POKE SI, 12                Mov Byte Ptr [SI],12
Figure 12-4: Similarities between BASIC's PEEK and POKE, and the assembly language Mov instruction.

Although you can easily find the address of Drive% by looking at SP, an extra step is required to get at the actual value. The example that follows shows how to do this, except there is one added complication. You are not allowed to use SP for addressing, except with 386 and later microprocessors. Since you undoubtedly want your programs to work with as many computers as possible, a different strategy must be used.

As I mentioned earlier, the BP register is a base register that is meant for accessing data on the stack. Therefore, you must first copy SP into BP, and then use BP to access the stack. Then you can find where Drive% is located, and put the current drive number into that address as shown following:

   Mov  BP,SP      ;put the current stack pointer into BP
   Mov  SI,[BP+4]  ;put the address of Drive% into SI
   Mov  AH,19h     ;tell DOS we want the default drive
   Int  21h        ;call DOS to do it
   Mov  [SI],AL    ;put the answer into Drive%

Notice how brackets are used to indicate the addresses. You must first determine the address of Drive%'s address (whew!), before you can put the value held in AL there. This is called indirect addressing, because a register is used to hold the address of the data. Again, notice how the 8088 accepts addition on the fly when you tell it BP+4.

The complete working GetDrive routine has two small added complications. Beside being unable to use SP for addressing memory, BASIC also requires you to not change BP either. The obvious solution, therefore, is to first save BP on the stack before changing it, and then restore BP later before returning to BASIC. The other complication is caused by the very fact that BASIC put extra information (Drive%'s address) onto the stack. But neither is insurmountable, as shown here:

   Push BP          ;save BP before changing it
   Mov  BP,SP       ;put the stack pointer into BP
   Mov  SI,[BP+6]   ;put the address of Drive% into SI
   Mov  AH,19h      ;tell DOS we want default drive
   Int  21h         ;call DOS to do it
   Mov  [SI],AL     ;put the answer into Drive%
   Pop  BP          ;restore BP to its original value
   Ret  2           ;return to BASIC

Notice that here, the address of Drive% is at [BP+6] rather than [BP+4] as it was in the previous listing. Since BP was pushed at the start of the procedure, the stack pointer is two bytes lower when it is subsequently assigned to BP. When SI is loaded, [BP] points to the saved version of itself, [BP+2] and [BP+4] point to the address and segment to return to, and [BP+6] holds the address of Drive%'s address. This is illustrated in Figure 12-5.

           ³
³
           ³
³
ÃÄÄÄÄÄÄÄÄÄÄ´
³  Drive%  ³ <ÄÄ [BP+6] points here
ÃÄÄÄÄÄÄÄÄÄÄ´
³ Ret Seg  ³ <ÄÄ [BP+4] points here
ÃÄÄÄÄÄÄÄÄÄÄ´
³ Ret Adr  ³ <ÄÄ [BP+2] points here
ÃÄÄÄÄÄÄÄÄÄÄ´
³ Saved BP ³ <ÄÄ [BP] points here
ÃÄÄÄÄÄÄÄÄÄÄ´
³   Next   ³ <ÄÄ the next available stack location
ÃÄÄÄÄÄÄÄÄÄÄ´
           ³
³
           ³
³
Figure 12-5: The state of the stack within a procedure after BP has been pushed.

Normally when a Ret command is encountered, the 8088 pops the last four bytes from the stack automatically, and returns to the segment and address contained in those bytes. But that would leave the 2-byte address of Drive% still cluttering up the stack. To avoid this problem the 8088 lets you specify a *parameter count* as part of the Ret instruction.

For each variable address that is passed with a CALL from BASIC, you must add 2 to the Return instruction in your assembler routine. This is the number of bytes to remove from the stack, with two being used for each incoming two-byte address. Had two variables been passed, the program would have used Ret 4 instead. Although it is possible to have the calling program clean up the stack itself, that would be wasteful. For every occurrence of every call that passes parameters, BASIC would have to include additional code following the call to increment SP accordingly. Pushing a parameter's address onto the stack leaves that much less stack space available. Therefore, someone has to reverse the process and either pop the addresses or use Add SP,Num to adjust the stack pointer. By having the called routine handle it, that code is needed only once. In fact, this is an important deficiency of C, because by design C requires the caller to clean up the stack.

[If you've managed to persevere this far you'll be pleased to know that in practice, the assembler can be told to handle most or all aspects of stack addressing for you. This is discussed in the sections that follow.] It is also possible to tell BASIC to pass some types of parameters by value using the BYVAL option in the DECLARE or CALL statements. When BYVAL is used, BASIC places the actual value of the variable onto the stack, rather than its address. This has several important benefits. First, the assembly language routine can use one less instruction. Second, when a constant number is passed, BASIC does not need to make a copy of it in DGROUP. This copying was described in Chapter 2.

However, BYVAL is appropriate only when a parameter does not have to be returned, and only when the values are integers. If you pass a double precision parameter using BYVAL, all eight bytes are placed on the stack using four separate instructions rather than only two needed to pass the address. You can also instruct BASIC to pass the full, segmented address of a parameter, and that is discussed in the section "Dynamic Arrays."