By Ethan Winer <http://www.ethanwiner.com>
All of the discussions so far have focused on how to write the instructions for an assembly language subroutine. However, none have described how these routines are added to a BASIC program, or how a complete procedure is defined. Furthermore, the previous examples have not shown a key step that is needed with all such external routines: establishing the code and data segments.
Before an external routine can be linked to a BASIC program you must establish a public procedure name that LINK can identify. I will first show the formal method for defining a procedure and its segments, and then show the newer, simplified methods that were introduced with MASM version 5.1. The simplified syntax is used for all of the remaining examples in this chapter [so don't worry if the setup details for this first example appear overwhelming].
The simplest complete subprogram you are likely to encounter is probably the PrtSc routine that follows--all it does is call Interrupt 5 to send the contents of the current display screen to LPT1.
Code Segment Word Public 'Code' Assume CS:Code Public PrtSc PrtSc Proc Far ;this is equivalent to SUB PrtSc STATIC in BASIC Int 5 ;call BIOS interrupt 5 Ret ;return to BASIC PrtSc Endp ;this is equivalent to BASIC's END SUB Code Ends End
The first three lines tell the assembler that the code is to be placed in the segment named Code, and that the name PrtSc is to be made public. The fourth line defines the start of a procedure. The actual code occupies the next two lines. Of course, you must tell the assembler where the procedure ends, which in this case is also the end of the code segment. Had several procedures been included within the same block of code, each procedure would show a start and end point, but there would only be a single code segment. The final End statement is needed to tell the assembler that this is the end of listing, although you might think that MASM would be smart enough to figure that out by itself!
Notice that there are two kinds of procedures: Far and Near. External routines that are called from BASIC are always Far, because BASIC uses what is called a *medium model*. This means the procedure does not necessarily have to be within the same code segment as the main BASIC program. The medium model allows the combined programs to exceed the usual 64k limit when linked to a final .EXE file.
When BASIC executes a CALL command, it uses a two-word address as the location to jump to. One of the words contains a segment, and the other an address within that segment. Then when your program finally returns, the 8088 must know to remove two words from the stack--a segment and an address--to find where to return to in the calling BASIC program.
A near procedure, on the other hand, calls an address that is only one word long. And when the procedure returns, only a single word is popped from the stack. Again, the assembler does the bulk of the dirty work for you. You just have to remember to use the word Far.
Fortunately, Microsoft realized what a pain dealing with segments and procedures and offsets from BP can be, and they enhanced MASM beginning with version 5.0 to handle these details automatically for you. Rather than require the programmer to define the various code and data segments, all that is needed are a few simple key words.
The first is .Model Medium, which tells MASM that the procedures that follow will be Far. Used in conjunction with .Code and .Data, .Model Medium tells MASM that any data you define should be placed into a group named DGROUP. Adding ,Basic after the .Model directive also declares your procedures as Public automatically, so BASIC can access them when your program is linked.
By using the name DGROUP, the linker automatically gathers all of your DB and DW data variables, and places them into the same segment that BASIC uses. While this has the disadvantage of impinging on BASIC's near data space, it also means that on entry to the routine the DS register (which BASIC sets to hold the DGROUP segment) hold the correct segment value for your variables as well.
To show the advantages of simplified directives, contrast the earlier PrtSc with this version that does exactly the same thing:
.Model Medium, Basic .Code PrtSc Proc Int 5 Ret Endp End
MASM 5.1 introduced additional simplified directives that let you access incoming parameters by name, rather than as offsets from BP. All of the remaining examples in this chapter take advantage of simplified directives, as the following revised listing for GetDrive illustrates.
;Syntax: CALL GetDrive(Drive%) .Model Medium, Basic .Data ;-- if variables were needed they would be placed here .Code GetDrive Proc, Drive:Word Mov AH,19h ;tell DOS we want the default drive Int 21h ;call DOS to do it Mov BX,Drive ;put the address of Drive% into BX Cbw ;clear AH to make a full word Mov [BX],AX ;then store the answer into Drive% Ret ;return to BASIC GetDrive Endp ;indicate the end of the procedure End ;and the end of the source file
As you can see, this looks remarkably like a BASIC SUB or FUNCTION procedure, with the incoming parameter listed by name and type as part of the procedure declaration. This greatly simplifies maintaining the code, especially if you add or remove parameters during development. If incoming parameters are defined as shown here using Drive%, code to push BP and then move SP into BP is added for you automatically. When you refer to one of the parameters, the assembler substitutes [BP+##] in the code it generates. Note, however, that the Word identifier for Drive refers to the 2-byte size of its address, and not the fact that Drive% is a 2-byte integer.Also notice the new Cbw command, which is used here to clear the AH register. Cbw (Convert Byte to Word) expands the byte value held in AL to a full word in AX. A full word is needed to ensure that both the high- and low-byte portions of Drive% are assigned, in case it held a previous value. If the value in AL is positive (between 0 and 127), AH is simply cleared to zero. And if AL is negative (between -128 and -1 or between 128 and 255), Cbw instead sets all of the bits in AH to be on. Thus, the sign of the original number in AL is preserved.
A complementary statement, Cwd (Convert Word to Double Word), converts the word in AX to a double-word in DX:AX. Again, if AX is positive when considered as a signed number, DX is cleared to zero. And if AX is currently negative, DX is set to FFFFh (-1) to preserve the sign. Cbw and Cwd are both one-byte instructions, so even with unsigned values they are always smaller and faster for clearing AH or DX than Mov AH,0 and Mov DX,0 which require two bytes and three bytes respectively.
Finally, the Ret command that exits the procedure is translated by MASM to include the correct stack adjustment value, based on the number of incoming parameters. If you have multiple exit points from the procedure (equivalent to EXIT SUB), the exit code will be generated multiple times. That is, each occurrence of Ret is replaced with a code sequence to pop the saved registers, and preform the 3-byte Ret # instruction. Therefore, you should always use a single exit point in a routine, and jump to that when you need to exit from more than one place.
Chapter 11 explained how interrupts work, and mentioned that only assembly language can call an interrupt directly. An assembler program uses the Int instruction, and this tells the 8088 to look in the interrupt vector table in low memory to obtain the interrupt procedure's segment and address. Then the procedure is called as if it were a conventional subroutine.
All of the DOS and BIOS services are accessed using interrupts, though there are so many different services that you also have to pass a service number to many of them. Most of the DOS services are accessed through interrupt 21h. Where BASIC uses the &H prefix to indicate a hexadecimal value, assembly language uses a trailing letter H. If you specify a number without an H it is assumed by MASM to be regular decimal. Note that MASM doesn't care if you use upper- or lowercase letters, and knows that either means hexadecimal.
When specifying hexadecimal values to MASM, the first character must always be a digit. That is, 1234h is acceptable, but &HB800 must be entered as 0B800h. Using B800h will generate a syntax error.
You have already seen how to call the BIOS routine that prints the screen and the DOS routine that returns the current drive. Let's continue and see how to call some of the other useful routines in the BIOS and DOS.
The next example program, DosVer, shows how to call the DOS service that returns the DOS version number. Like many of the assembler routines that you can use with BASIC, DosVer relies on an existing DOS service to do the real work. In this program you will also learn how to push and pop values on the stack.
The syntax for DosVer is CALL DosVer(Version%), where Version% returns with the DOS version number times 100. That is, if your PC is running DOS version 3.30, then Version% will be assigned the value 330. Manipulating floating point numbers is much more difficult than integers, and the added complexity is not justified for this routine.
The DOS service that retrieves the version number returns with two separate values--the major version number (3 in this case) and the minor number (30). These values are returned in AL and AH respectively. The strategy here is to first multiply AL by 100, and then add AH. The last step is to assign the result to the incoming parameter Version%.
Unfortunately, when you use AL for multiplication, the value 100 must be in a register or memory location. You can't just use MUL AL,100 though it would sure be nice if you could. Further, whenever AL is multiplied the result is placed into the entire AX register. Therefore, DosVer also uses BX to temporarily store the original contents of AX before the two are added together.
As you already have learned, the only register that can be multiplied is AX, or its low-byte portion, AL. MASM knows if you plan to multiply AX or AL based on the size of the argument. For example, Mul BX means to multiply AX by BX and leave the result in DX:AX. Mul CL instead multiplies AL by CL and leaves the answer in AX.
The complete DosVer routine is shown following, and comments explain each step.
;DOSVER.ASM, retrieves the DOS version number .Model Medium, Basic .Code DOSVer Proc, Version:Word Mov AH,30h ;service 30h gets the version Int 21h ;call DOS to do it Push AX ;save a copy of the version for later Mov CL,100 ;prepare to multiply AL by 100 Mul CL ;AX is now 300 if running DOS 3.xx Pop BX ;retrieve the version, but in BX Mov BL,BH ;put the minor part into BL for adding Mov BH,0 ;clear BH, we don't want it anymore Add AX,BX ;add the major and minor portions Mov BX,Version ;get the address for Version% Mov [BX],AX ;assign Version% from AX Ret ;return to BASIC DOSVer Endp End
Notice the extra switch that is done with BH and BL. AX is saved onto the stack because multiplying the byte in AL leaves the result as a full word in AX, thus destroying AH. When the version is popped into BX, the minor part is in BH. But you are not allowed to add registers that are different sizes (AX and BH). Further, any number in the high half of a register is by definition 256 times the value of the same number in a low half. Therefore, BH is first copied to BL to reflect its true value. BH is then cleared so it won't affect the result, and finally AX and BX are added.
A better way to save AX and then restore it to BX would be to simply use Mov BX,AX immediately after the call to Interrupt 21h. I used Push and Pop just to show how this is done. As you can see, it is not necessary to pop the same register that was pushed. However, every Push instruction must always have a corresponding Pop, to keep the stack balanced. If a register or other value is on the stack when the final Ret is encountered, that value will be used as the return address which is of course incorrect.
Division also acts on AX, or the combination of DX:AX. When you use the command Div BL, the 8088 knows you want to divide AX because BL is a byte-sized argument. It then leaves the result in AL and the remainder, if any, is placed into AH. Similarly, Div DX means that you are dividing the long integer in DX:AX, because DX is a word. The result of this division is assigned to AX, with the remainder in DX.
As Chapter 2 explained, strings are stored very differently than regular numeric variables. BASIC lets you find the address of any variable with the VARPTR function. For integer or floating point numbers, the value VARPTR returns is the address of the actual data. But for strings, VARPTR instead returns the address of a string descriptor.
DOS employs a different method entirely for its strings, using a CHR$(0) to mark the end. This is describes separately later in the section "DOS Strings."
A BASIC string descriptor is a table containing information about the string--that is, its length and address. In Microsoft compiled BASIC a string descriptor is comprised of two words of information. For QuickBASIC and near strings when using BASIC PDS, the first word contains the length of the string and the second holds the address of the first character. Consider the following BASIC instructions:
X$ = "Assembler" V = VARPTR(X$)
V now holds the starting address of the four-byte descriptor for X$. For the sake of argument, let's say that V is now 1234. Addresses 1234 and 1235 will together contain the length of X$ which is 9, and addresses 1236 and 1237 will contain yet another address--that of the first character in X$. You can therefore find the length of X$ using this formula:
Length = PEEK(V) + 256 * PEEK(V + 1)
And the first character "A" can be located with this:
Addr = PEEK(V + 2) + 256 * PEEK(V + 3)
You could then print the string on the screen like this:
FOR C = Addr TO Addr + 8 PRINT CHR$(PEEK(C)); NEXT
Therefore, this is a BASIC model for how strings are located by an assembly language program. When you call an assembler routine with a string argument, BASIC first pushes the address of the descriptor onto the stack, before calling the routine. The next example is called Upper, because it capitalizes all of the characters in a string. Even though BASIC offers the UCASE$ and LCASE$ functions, these are relatively slow because they return a copy of the data that has been manipulated. Upper instead capitalizes the data in place very quickly.
The strategy is to first get the descriptor address from the stack. Then Upper puts the length into BX and the address of the string data into SI. Upper steps through the string starting at the end, decrementing BX by one for each character. When BX crosses zero, it is done. A BASIC version is shown first, followed by the assembly language equivalent.
Upper in BASIC:
SUB Upper(Work$) STATIC '-- load SI with the address of Work$ descriptor SI = VARPTR(Work$) '-- assign LEN(Work$) to BX BX = PEEK(SI) + 256 * PEEK(SI + 1) '-- the address of the first character goes in SI SI = PEEK(SI + 2) + 256 * PEEK(SI + 3) More: BX = BX - 1 'point to the end of Work$ IF BX < 0 GOTO Exit 'no more characters to do AL = PEEK(SI + BX) 'get the current character IF AL < ASC("a") GOTO More 'skip conversion if too low IF AL > ASC("z") GOTO More 'or if too high AL = AL - 32 'convert to upper case POKE SI + BX, AL 'put character back in Work$ GOTO More 'go do it all again Exit: 'return to caller END SUB
Upper in assembly language:
Upper Proc, Work:Word Mov SI,Work ;load SI with Work$'s descriptor address Mov BX,[SI] ;put LEN(Work$) into BX Mov SI,[SI+2] ;SI holds address of the first character Next: Dec BX ;point to the next prior character Js Exit ;if sign is negative BX is less than 0 Mov AL,[BX+SI] ;put the current character into AL Cmp AL,"a" ;compare it to ASC("a") Jb More ;jump if below to More Cmp AL,"z" ;compare AL to ASC("z") Ja More ;jump if above to More Sub AL,32 ;convert AL to upper case Mov [BX+SI],AL ;put AL back into Work$ Jmp More ;jump to More Exit: Ret ;return to BASIC Upper Endp End
Notice that for expediency, these routines work backwards from the end of the string. There are a number of shortcuts that you can use in assembly language, and one important one is being able to quickly test the result of the most recent numeric operation. If the program worked forward through the string, it would take three lines of code to advance to the next character, and also require saving the string length separately:
Inc BX ;point to the next character Cmp BX,Length ;are we done yet? Jne More ;no, continue
Notice the use of a new form of conditional jump--Js which stands for *Jump if Signed*. Here the code tests the sign of the number in BX, and jumps if it is negative. Though I haven't mentioned this yet, a conditional jump doesn't always have to follow a compare. Although a comparison will set the flags in the 8088 that indicate whether a particular condition is true, so will several other instructions. Some of these are Add, Sub, Dec, and Inc, but not Mov. So instead of having to include an explicit comparison:
Dec BX ;decrement BX Cmp BX,0 ;compare it to zero Jl More ;jump if less to More
All that is really needed is this:
Dec BX Js More
The Dec instruction sets the Sign Flag automatically, just as if a separate compare had been performed.
Besides Je, Jne, and Js, there are a few other forms of conditional jump instructions you should understand. Figure 12-6 lists all of the ones you are likely to find useful.
Command Meaning อออออออ ออออออออออออออออออออออออออออออออออออออ Je Jump if equal Jne Jump if not equal Ja Jump if above (unsigned basis) Jna Jump if not above (unsigned basis) Jb Jump if below (unsigned basis) Jnb Jump if not below (unsigned basis) Jg Jump if greater (signed basis) Jng Jump if not greater (signed basis) Jl Jump if less (signed basis) Jnl Jump if not less (signed basis) Jc Jump if Carry Flag is set Jnc Jump if Carry Flag is clear Js Jump if sign flag is set Jns Jump if sign flag is not set Jcxz Jump if CX is zero
You should know that Je and Jne also have an alias command name: Jz and Jnz. These stand for *Jump if Zero* and *Jump if Not Zero* respectively, and they are identical to Je and Jne. In fact, though I didn't mention this earlier, the Repe and Repne string repeat prefixes are sometimes called Repz and Repnz.
Because Je and Jz cause MASM to generate the identical machine code bytes, they may be used interchangeably. In some cases you may want to use one instead of the other, depending on the logic in your program. For example, after comparing two values you would probably use Je or Jne to branch if they are equal or not equal. But after testing for a zero or non-zero value using Or AX,AX you would probably use Jz or Jnz. This is really just a matter of semantics, and either version can be used with the same results.
Also, please understand that Jnb is not the same as Ja. Rather, the case of being Not Below is the same as being Above Or Equal. In fact, MASM recognizes Jae (Jump if Above or Equal) to mean the same thing as Jnb. Likewise, Jbe (Jump if Below or Equal) is the same as Jna, Jge (Jump if Greater or Equal) is the same as Jnl, and Jle (Jump if Less or Equal) is identical to Jng. Again, which form of these instructions you use will depend on how you are viewing the data and comparisons.
Note the special form of conditional jump, Jcxz. Jcxz stands for Jump if CX is Zero, and it combines the effects of Cmp CX,0 and Je label into a single fast instruction. Jcxz is also commonly used prior to a Loop instruction. When you use Loop to perform an operation repeatedly, CX must be assigned initially to the number of times the loop is to be executed. But if CX is zero the loop will execute 65536 times! Thus, adding Jcxz Exit avoids this undesirable behavior if zero was passed accidentally.
Finally, you must be aware that a conditional jump cannot be used to branch to a label that is more than 128 bytes earlier, or 127 bytes farther ahead in the code. A condition jump instruction is only two bytes, with the first indicating the instruction and the other holding the branch distance. If you need to jump to a label farther away than that you must reverse the sense of the condition, and jump to a near label that skips over another, unconditional jump:
Cmp AX,BX ;we want to jump to Label: if AX is greater Jna NearLabel ;so jump to NearLabel if it's NOT greater Jmp Label ;this goes to Label: which is farther away NearLabel: . .
As used here, the unconditional Jmp instruction can branch to any location within the current code segment. There is also a short form of Jmp, which requires only two bytes of code instead of three. If you are jumping backwards in the program and the address is within 128 bytes, MASM uses the shorter form automatically. But if the jump is forward, you should specify Short explicitly: Jmp Short Label. Some non-Microsoft assemblers do not require you to specify Short; the newest MASM version 6.x also adjusts its generated code to avoid the extra wasted byte.
When string information is passed to a DOS routine, for example when giving a file or directory name, the string must end with a CHR$(0). In DOS terminology this is called an ASCIIZ string. (Do not confuse this with a CHR$(26) Ctrl-Z which marks the end of a file.) Unlike BASIC, DOS does not use string descriptors, so this is the only way DOS can tell when it has reached the end. By the same token, when DOS returns a string to a calling program, it marks the end with a trailing zero byte.
When passing a string to a DOS service from BASIC you must either concatenate a CHR$(0) manually, or add extra code within the assembler routine to copy the name into local storage and add a zero byte to the copy. From BASIC you would therefore use something like this:
CALL Routine(FileName$ + CHR$(0))
Fixed-length strings and the string portion of a TYPE variable do not use a string descriptor, which you might think would require a different strategy to access them. But whenever a fixed-length string is used as an argument to an assembler routine or BASIC subprogram, BASIC first copies it into a temporary conventional string, and it is the temporary string that is passed to the routine. When the routine returns, BASIC copies the characters back into the original fixed-length string. Thus, any routine written in assembly language that expects a descriptor will work correctly, regardless of the type of string being sent.
Of course, this copying requires BASIC to generate many extra bytes of assembler code for each call. If you do not want BASIC to create a temporary string copy from one of a fixed-length, you must first define the string as a TYPE like this:
TYPE Flen S AS STRING * 20 END TYPE DIM FString AS FLen
Though this appears to be the same as defining FString as a string with a fixed length of 20, there is an important difference: declaring it as a TYPE tells BASIC not to make a copy. That is, BASIC does not treat FString as a string, as long as the ".S" portion that identifies it as a string is not used. Here's an example based on the FLen TYPE that was defined above:
DIM FString AS FLen 'FString is a TYPE variable FString.S = "This is a test" 'assign the string portion CALL Routine(FString) 'call the routine without .S
Here, the address of the first character in the string is passed to the routine, as opposed to the address of a temporary string descriptor. We have told BASIC to call Routine, and pass it the entire FString TYPE but without interpreting the .S string component. This next example does cause BASIC to create a temporary copy:
The short assembly language routine that follows expects the address of a fixed-length string with a length of 20, as opposed to the address of a string descriptor. The routine then copies the characters to the upper-left corner of a color monitor.
Push BP ;access the stack as usual Mov BP,SP Mov SI,[BP+6] ;SI points to the first character Mov DI,0 ;the first address in screen memory Mov AX,0B800h ;color monitor segment when in text mode Mov ES,AX ;move into ES through AX Mov CX,20 ;prepare to copy 20 characters Cld ;clear the direction flag to copy forward More: Movsb ;copy a byte to screen memory Inc DI ;skip over the attribute byte Loop More ;loop until done Pop BP ;restore BP Ret 2 ;return to BASIC
Recall that the color monitor segment value of 0B800h must be assigned to ES through AX, because it is not legal to assign a segment register from a constant. Also, notice the way that DI is cleared to zero. Although Mov DI,0 indeed moves a zero into DI, this is not the most efficient way to clear a register. Any time a numeric value is used in a program (0 in this case), that much extra space is needed to store the actual value as part of the instruction. A preferred method for clearing a register is with the Xor instruction. That is, Xor DI,DI gives the same result as Mov DI,0 except it is one byte shorter and slightly faster.
When Xor is performed on any two values, only those bits that are different are set to 1. But since the same register is used here for both operands, all of the result bits will be cleared to 0. The code for using Xor is decidedly less obvious, but you'll see Xor used this way very often in assembly listings in magazines and books. Another, equally efficient way to clear a register is to subtract it from itself using Sub AX,AX.
Accessing near strings in QuickBASIC and BASIC PDS is a relatively simple task, because both the descriptor and the string data are known to be in near DGROUP memory. But BASIC PDS also supports far strings, where the data may be in a different segment. The composition of a far string descriptor was shown in Chapter 2; however, you do not need to manipulate these descriptors yourself directly.
BASIC PDS includes two routines--StringLength and StringAddress--that do the work of locating far strings for you. Further, because Microsoft could change the way far strings are organized in the future, it makes the most sense to use the routines Microsoft supplies. If the layout of far string descriptors changes, your program will still work as expected.
StringLength and StringAddress expect the address of the string descriptor, and they return the string's length and segmented address respectively. Note that while far string data may be in nearly any segment, the descriptors themselves are always in DGROUP. Also note that these routines are not very well-behaved. In particular, registers you may be using are changed by the routines. To solve this problem and also to let you get all of the information in a single call, I have written the StringInfo routine. StringInfo is contained in the FAR$.ASM file on the accompanying disk.
;from an idea originally by Jay Munro .Model Medium, Basic Extrn StringAddress:Proc ;these are part of PDS Extrn StringLength:Proc .Code StringInfo Proc Uses SI DI BX ES Pushf ;save the flags manually Push ES ;save ES for later Push SI ;pass incoming descriptor Call StringAddress ;call the PDS routine Pop ES ;restore ES for StringLength Push AX ;save offset and segment Push DX ; returned by StringAddress Push SI ;pass incoming descriptor Call StringLength ;get the length Mov CX,AX ;copy the length to CX Pop DX ;retrieve the saved Segment Pop AX ;and the address Popf ;restore the flags manually Ret ;restore registers and return StringInfo Endp End
StringInfo is called with DS:SI pointing to the string descriptor, and it returns the length in CX and the address of the string data in DX:AX. Although StringInfo could be designed to return the segment in DS or ES, it is safer to assign the segment registers yourself manually.
Notice the Uses clause--this tells MASM that the named registers must be preserved, and generates additional code to push those registers upon entry to the procedure, and pop them again upon exit.
Also notice the new Extrn directive at the beginning of the source file. These tell the assembler that the stated routines are not in the current source file. MASM then places the external name in the object file header, with instructions to LINK to fill in the address portion of the Call. Data must also be declared as external if it is not in the same source file as the routine being assembled. When a data item is to be made available to other modules, you must also have a corresponding Public statement in that file for the same reason:
.Model Medium, Basic .Data Public MyData MyData DW 12345 . .
As you have seen, a conventional variable is passed to an assembly language subroutine by placing its address onto the stack. If the variable is a string, then the address passed is that of its descriptor, and the string data address is read from there. Accessing array elements is only slightly more involved, because array elements are always stored in adjacent memory locations. Let's look first at integer arrays.
When BASIC encounters the statement DIM X%(100) in your program, it allocates a contiguous block of memory 202 bytes long. (Unless you first used the statement OPTION BASE 1, dimensioning an array to 100 means 101 elements.) The first two bytes in this block hold the data for X%(0), the next two bytes hold X%(1), and so forth. When you ask VARPTR to find X%(0), the address it returns is the start of this block of memory.
The address of subsequent array elements may then be easily computed from this base address. But with a dynamic array, the segment that holds the array may not be the same as the segment where regular variables are stored. Also, huge arrays that span more than 64K require extra care when crossing a 64K segment boundary.
String arrays are structured in a similar fashion, in that each element follows the previous one in memory. For each string array element that is dimensioned, four bytes are set aside. These bytes comprise a table of descriptors which contain the length and address words for each element in the array. But the important point is that once you know where one element or string descriptor is located, it is easy to find all of those that are adjacent. Following is a QuickBASIC example that shows how to locate Array$(15), based on the VARPTR address of Array$(0).
DIM Array$(100) Array$(15) = "Find me" Descriptor = VARPTR(Array$(0)) Descriptor = Descriptor + (4 * 15) Length = PEEK(Descriptor) + 256 * PEEK(Descriptor + 1) PRINT "Length ="; Length Addr = PEEK(Descriptor + 2) + 256 * PEEK(Descriptor + 3) PRINT "String = "; FOR X = Addr TO Addr + Length - 1 PRINT CHR$(PEEK(X)); NEXT
Most of the routines shown so far manipulated variables that are located in near memory. BASIC can store numeric, TYPE, and fixed-length string arrays in far memory, and additional steps are needed to read from and write to those arrays.
When an assembly language routine receives control after a call from BASIC, it can access your regular variables because they are in the default data segment. Most memory accesses assume the data is in the segment held in the DS register. For example, the statement Mov [BX],AX assigns the value in AX to the memory location identified by BX within the segment held in DS. Likewise, Sub [DI+10],CX subtracts the value held in CX from the memory address expressed as DI+10, where that address is again in the default data segment.
It is also possible to specify a segment other than the current default. One way is with a *segment override* command, like this:
Here, the segment held in ES is used instead of DS. A segment override adds only one byte of code, so it is quite efficient. If you plan to access data in a different segment many times, you can optionally set DS to that segment. However, it is mandatory that you reset DS to its original value before returning to BASIC. You must also understand that changing DS means you no longer have direct access to DGROUP anymore. In that case you could use the stack segment as an override, since the stack segment is always the same as the data segment in a BASIC program. The next short example shows this in context.
Push DS ;save DS Mov DS,FarSegment ;now DS points to your far data . ;access that far data here . Mov AX,SS:[Variable] ;access Variable in DGROUP . ;access more far data here Pop DS ;restore DS before returning
When Microsoft introduced QuickBASIC version 2.0, one of the most exciting new features it offered was support for dynamic numeric arrays. Unlike QuickBASIC near strings, string arrays, and non-array variables, these arrays are always located outside of BASIC's near 64K data segment. This means that an assembler routine needs some way to know both the address and the segment for an array element that is passed to it.
In general, routines you design that work on an entire array will be written to expect a particular starting element. The routine can then assume that all of the subsequent elements lie before or after it in memory. Unfortunately, this does not always work unless you add extra steps. If you call an assembly language routine passing one element of a far-memory dynamic array like this:
BASIC makes a copy of the array element into a temporary variable in near memory, and then passes the address of that copy to the routine. Thus, while the routine can still receive an array element's value, it has no way to determine its true address. And without the address, there is no way to get at the rest of the array.
Since being able to pass an entire array is obviously important, BASIC supports two options to the CALL command--SEG and BYVAL. The SEG keyword indicates that both the address and the segment are to be passed on the stack, and it also tells BASIC not to make a copy of the array element. SEG is used with an array element (or any variable, for that matter) like this:
CALL Routine(SEG Array%(1))
You could also send the segment and address manually, like this:
CALL Routine(BYVAL VARSEG(Array%(1)), BYVAL VARPTR(Array%(1)))
In both cases, BASIC first pushes the segment where the element resides onto the stack, followed by the element's address within that segment. By pushing them in this order the routine can conveniently use either Lds (Load DS) or Les (Load ES) to get both the segment and address in one operation:
Les DI,[BP+6] ;if using manual stack addressing
Les BX,[StackArg] ;if using MASM's simplified directives
Les loads four bytes in one operation, placing the lower word at [BP+6] into the named register (DI in the first example case), and the higher word at [BP+8] into ES. Lds works the same, except the higher word is instead moved into DS. Once the segment and address are loaded, you can access all of the array elements:
Push DS ;save DS Lds SI,[BP+6] ;now DS:SI points at first element Mov [SI],AX ;assign Array%(1) from AX Add SI,2 ;now SI points at the next element Mov [SI],BX ;assign Array%(2) from BX Pop DS ;restore DS . ;continue .
If Les were used instead of Lds, then an ES: override would be needed to assign the elements. Although you must always preserve the contents of DS regardless of the version of BASIC, some registers need to be saved only when using BASIC PDS far strings. Other registers do not need to be saved at all. Figure 12-7 shows which registers must be preserved based on the version of BASIC.
QuickBASIC and BASIC PDS PDS near strings far strings อออออออออออออออ ออออออออออ DS DS SS SS BP BP SP SP ES SI DI
Besides having to save and restore the registers shown in Figure 12-7, you must also be sure that the Direction Flag is cleared to forward before returning to BASIC. The Direction Flag affects the 8088 string operations, and is by default set to forward. You can usually ignore the direction flag unless you set it to backwards explicitly with the Std instruction. In that case, you must use a corresponding Cld command.
A huge array is one that spans more than one 64K segment, and as you can imagine, it requires extra steps to access all of the elements. That is, the assembler routine must know which elements are in what segment, and manually load those segments as needed. The following code fragment shows how to walk through all of the elements in a huge integer array, and just for the sake of the example adds each element to determine the sum of all of them.
A simple setup example and call syntax for this routine is as follows:
REDIM Array&(1 TO 30000) FOR X% = 1 TO 30000 Array&(X%) = X% NEXT CALL SumArray(SEG Array&(1), 30000, Sum&) PRINT "Sum& ="; Sum&
And here's the code for the SumArray routine:
.Model Medium, Basic .Code SumArray Proc Uses SI, Array:DWord, NumEls:Word, Sum:Word Push DS ;save DS so we can restore it later Push SI ;PDS far strings require saving SI too Xor AX,AX ;clear AX and DX which will accumulate Mov DX,AX ; the total Mov BX,NumEls ;get the address for NumElements% Mov CX,[BX] ;read NumElements% before changing DS Lds SI,Array ;load the address of the first element Jcxz Exit ;exit if NumElements = 0 Do: Add AX,[SI] ;add the value of the low word Adc DX,[SI+2] ;and then add the high word Add SI,4 ;point to the next array element Or SI,SI ;are we beyond a 32k boundary? Jns More ;no, continue Sub SI,8000h ;yes, subtract 32k from the address Mov BX,DS ;copy DS into BX Add BX,800h ;adjust the segment to compensate Mov DS,BX ;copy BX back into DS More: Loop Do ;loop until done Exit: Pop SI ;restore SI for BASIC Pop DS ;restore DS and gain access to Sum& Mov BX,Sum ;get the DGROUP address for Sum& Mov [BX],AX ;assign the low word Mov [BX+2],DX ;and then the high word Ret ;return to BASIC SumArray Endp End
The segment bounds checking is handled by the six lines that start with Or SI,SI. The idea is to see if the address is beyond 32767, subtract 32768 if it is, and then adjust the segment to compensate. The most direct way would have been with Cmp SI,32767 and then Ja More, but Cmp used this way generates three bytes of code, whereas Or creates only two bytes. Since Or sets the Sign flag if the number is negative (above 32767), you can use it to know when the address adjustment is needed.
Because it is not legal to add or subtract a segment register, DS is first copied to BX, 800h is added to that, and the result is then copied back to DS. 800h is used instead of 8000h (32768) because a new segment begins every 16 bytes. [That is, adding 800h to a segment value is the same as adding 8000h to the address.]
SumArray also introduces a new instruction: Adc means Add with Carry, and it is used to add long integer values that by definition span two words. When you add two registers--say, AX and BX--if the result exceeds 65535 only the remainder is saved. However, the Carry Flag is set to indicate the overflow condition. Adc takes this into account, and adds one extra to its result if the Carry Flag is set. Therefore, whenever two long integers are added you'll use Add to combine the lower words, and Adc for the high words. Similarly, subtracting long integers requires that you use Sub to subtract the lower words and then Sbb (Subtract with Borrow) on the upper words.
Although the details are hidden from you, when more than one parameter is passed to an assembly language routine it is the last in the list that is at [BP+6] on the stack. The previous argument is at [BP+8], and the one before that is at [BP+10]. Because the stack grows downward as new items are pushed onto it, each subsequent item is at a lower address.
Finally, in a real program this routine would probably be designed as a function. Using a function avoids having to pass the Sum& parameter to receive the returned value, and helps reduce the size of the program.
Designing a procedure as a function lets you return information to a program, but without the need for an extra passed parameter. Functions are also useful because BASIC performs any necessary data type conversion automatically. For example, if you have written a function that returns an integer value, you can freely assign the result to a single precision variable.
You can also test the result of a function directly using IF, display it directly with PRINT, or pass it as a parameter to another procedure. Some typical examples are shown here:
SingleVar! = MyFunction% IF YourFunction&(Argument%) > 1004 THEN ... PRINT HisFunction$(Any$)
Beginning with QuickBASIC version 4.0, functions written in assembly language may be added to a BASIC program. To have a function return an integer value, simply place the value into the AX register before returning to BASIC. If the function is to return a long integer, both DX and AX are used. In that case, DX holds the higher word and AX holds the lower one.
String functions are only slightly more complicated to design. A string function also uses AX as a return value, but in this case AX holds the address of a string descriptor you have created. The complete short string function that follows accepts an integer argument, and returns the string "False" if the argument is zero or "True" if it is not.
;Syntax: ;DECLARE FUNCTION TrueFalse$(Argument%) ;Answer$ = TrueFalse$(Argument%) .Model Medium, Basic .Data DescLen DW 0 DescAdr DW 0 True DB "True" False DB "False" .Code TrueFalse Proc, Argument:Word Mov DescLen,4 ;assume true Mov DescAdr,Offset True Mov BX,Argument ;get the address for Argument% Cmp Word Ptr [BX],0 ;is it zero? Jne Exit ;no, so we were right Inc DescLen ;yes, return five characters Mov DescAdr,Offset False ;and the address of "False" Exit: Mov AX,Offset DescLen ;show where the descriptor is Ret ;return to BASIC TrueFalse Endp End
Although the function is declared using a dollar sign in the name, the actual procedure omits that. [The dollar sign merely tells BASIC what type of information will be returned. It is not part of the actual procedure name.] TrueFalse begins by defining a string descriptor in the .Data segment. It is also possible to store strings and other data in the code segment and access it with a CS: segment override. However, data that is returned as a function must be in DGROUP, and so must the descriptor.
The first two statements assign the descriptor to an output string length of four characters, and the address of the message "True". Then, the address of Argument is obtained from the stack, and its value is compared to zero. If it is not zero, then the descriptor is already correct and the function can proceed. Otherwise, the descriptor length is incremented to reflect the correct length, and the address portion is reassigned to show where the string "False" begins in memory. In either case, the final steps are to load AX with the address of the descriptor, and then return to BASIC.
MASM also lets you access data using simple arithmetic. For example, the descriptor could have been defined as a single pair of words with one name, and the second word could be accessed based on the address of the first one like this:
.Data Descriptor DW 0, 0 True DB "True" False DB "False" .Code . . Inc Descriptor Mov Descriptor+2,Offset False . .
Far string functions require more work to write than near string functions, because of the added overhead needed to support far strings. Fortunately, BASIC includes routines that simplify the task for you. Actually, the routines to create and assign strings have always been included; it's just that Microsoft never documented how to do it before BASIC 7.0. Later in this chapter I'll show code to create strings that works with all versions of BASIC 4.0 or later.
The StringAssign routine expects six arguments on the stack, for the segment, address, and length of both the source and destination strings. StringAssign can assign from or to any combination of fixed- and variable- length strings. If the length argument for either string is zero, then StringAssign knows that the address is that of a descriptor. Otherwise, the address is of the data in a fixed-length string.
Because of the added overhead of obtaining values and pushing them on the stack, I have created a short wrapper program that does this for you. MakeString accepts the same arguments as StringAssign, but they are passed using registers rather than on the stack. Of course, calling one routine that in turn calls another takes additional time. But the savings in code size when MakeString is called repeatedly will overshadow the very slight additional delay.
MakeString is called with DX:AX holding the segmented address of the source string, and CX holding its fixed length. If the source is a conventional string, CX is set to zero to indicate that. The destination address is identified with DS:DI, using BX to hold the length. Again, BX holds zero if the destination is not a fixed-length string.
;from an idea originally by Jay Munro .Model Medium, Basic Extrn STRINGASSIGN:Proc .Code MakeString Proc Uses DS Push DX ;push the segment of the source string Push AX ;push the address of the source string Push CX ;push the string length Push DS ;push the segment of the destination Push DI ;push the address of the destination Push BX ;push the destination length Call STRINGASSIGN ;call BASIC to assign the string Ret MakeString Endp End
Now, with the assistance of MakeString, TrueFalse$ can be easily modified to work with BASIC 7 far strings:
.Model Medium, Basic Extrn MakeString:Proc ;this is in FAR$.ASM .Data Descriptor DW 0, 0 ;the output string descriptor True DB "True" False DB "False" .Code TrueFalse Proc Uses ES DS SI DI, Argument:Word Mov CX,4 ;assume true Mov AX,Offset True Mov BX,Argument ;get the address for Argument% Cmp Word Ptr [BX],0 ;is it zero? Jne @F ;no, so we were right Inc CX ;yes, assign five characters Mov AX,Offset False ;and use the address of "False" @@: Mov DX,DS ;assign the segment and address Mov DI,Offset Descriptor ; of the destination descriptor Xor BX,BX ;assign to a descriptor Call MakeString ;let MakeString do the work Mov AX,DI ;AX = address of output descriptor Ret ;return to BASIC TrueFalse Endp End
Notice the introduction of the new at-symbol (@) assembler directive. The at-symbol and double at-symbol label are quite useful, because they let you avoid having to create unique label names each time you specify the target of a jump. As with BASIC, creating many different label names is a nuisance, and also impinges on the assembler's working memory. When a label is defined using @@: as a name, you can jump forward to it using @F or backwards using @B. Multiple @@: labels may be used in the same program, and @F and @B always branch to the nearest one in the stated direction.
Single and double precision functions are handled in yet another manner. Although a single precision value could be returned in the DX:AX register combination, a double precision result would need four registers, which is impractical. Further, a floating point number is most useful to BASIC if it is stored in a memory location, rather than in registers.
When BASIC invokes a floating point function it adds an extra, dummy parameter to the end of the list of arguments you pass. If no parameters are being used, it creates one. This parameter is the address into which your routine is to place the outgoing result. Because of this added parameter, it is essential that you account for it when returning to BASIC. Thus, a function without arguments must use Ret 2, a function with one argument needs Ret 4, and so forth. Since we're using MASM's simplified directives, all that is needed is to create an extra parameter name.
The short double precision function that follows squares a double precision number much faster than using Value# ^ 2, and also shows how to perform simple floating point math using assembly language. You will declare and invoke Square like this:
DECLARE FUNCTION Square#(Variable#) Result = Square#(Variable#)
;SQUARE.ASM, squares a double precision number ; ;WARNING: This file must be assembled using /e (emulator). .Model Medium, Basic .Code .8087 ;allow 8087 instructions Square Proc, InValue:Word, OutValue:Word Mov BX,InValue ;get the address for InValue FLd QWord Ptr [BX] ;load InValue onto the 8087 stack FMul QWord Ptr [BX] ;multiply InValue by itself Mov BX,OutValue ;get the address for OutValue FStp QWord Ptr [BX] ;store the result there FWait ;wait for the 8087 to finish Mov AX,BX ;return DX:AX holding the full Mov DX,DS ; address of the output value Ret ;return to BASIC Square Endp End
This Square function illustrates several important points. The first is the use of MASM's /e switch, which lets an assembly language routine share BASIC's floating point emulator. When a BASIC program begins, it looks to see if an 8087 coprocessor is installed in the host PC. If so, it uses one set of library routines; otherwise it uses another.
The library routines that use an 8087 simply modify the caller's code to change the floating point interrupts that BASIC generates into actual 8087 instructions. It then returns to the instruction it just created and executes it. Although this adds to the time needed to perform a floating point operation, the code is patched only once. Thus, statements within a FOR or DO loop operate very quickly after the first iteration. This is very much like the method used by the BRUN library described in Chapter 1.
When no coprocessor is detected, the floating point interrupts that BASIC generates are used to invoke routines in BASIC's floating point software emulator. As its name implies, an emulator imitates the behavior of a coprocessor using assembly language commands. A coprocessor can perform a variety of floating point operations, including addition, multiplication, and rounding, as well as some transcendental functions such as logarithms and arctangents.
When you use the /e switch, MASM adds extra information to the object file header that tells LINK where to patch your 8087 instructions. LINK can then change your code to the equivalent floating point interrupts, similar to the way BASIC patches its own code to change the interrupts to 8087 instructions. Therefore, when you write floating point code that will be called from BASIC, your routine can tie into BASIC's emulator, and use it automatically if no coprocessor is installed.
Also, notice the .8087 directive which tells MASM not to issue an error message when it sees those instructions. Other, similar directives are .80287 and .80387, and also .80286 and .80386. These directives inform MASM that you are intentionally using advanced commands that require these processors, and have not made a typing error.
The actual body of the Square function is fairly simple. First, the address of the incoming value is retrieved from the system stack, and then the data at that address is loaded onto the coprocessor's stack using the FLd (Floating point Load) instruction. Since this is a double precision value, QWord Ptr (Quad Word Pointer) is needed to indicate the size of the data. Had the incoming value been single precision, DWord Ptr (Double Word Pointer) would be used instead. One important feature of an 8087 or software emulator is that a number may be converted from one numeric format to another simply by loading it as one data type, and then saving it as another.
The next instruction, FMul (Floating point Multiply), multiplies the value currently on the 8087 stack by the same address. Since the original value is still present, there's no need to make a new copy. Next, the destination address is placed into BX, and the result now on the 8087 stack is stored there. The trailing letter p in the FStp instruction specifies that the value loaded earlier is to be popped from the coprocessor stack.
A complete discussion of 8087 instructions and how the coprocessor stack operates goes beyond what I can hope to cover here. When in doubt about what instruction is needed, I suggest that you code a similar sample in BASIC, and then examine the code BASIC generates using CodeView. There are also several books that focus on writing floating point instructions in assembly language.
The last 8087 instruction is FWait, and it tells the 8088 to wait until the coprocessor has finished, before continuing. Because an 8087 is a true coprocessor, it operates independently of the main 8088 CPU. Once a value is loaded and the 8087 is instructed to perform an operation, the 8087 returns immediately to the program that issued the instruction and continues to process the numbers in the background. If Square exited immediately and BASIC read the returned value, there's a good chance that the 8087 did not finish and the value has not yet been stored! In that case, whatever happened to be in memory at that time would be the value that BASIC uses, which is obviously incorrect.
Experienced 8087 programers know how long the various coprocessor instructions take to complete, and with careful planning the number of FWait commands can be kept to a minimum. However, the code that BASIC generates always finishes with an FWait. Of course, there is no need to wait when the emulator is in use. In fact, an FWait is patched by BASIC to do nothing (Mov AX,AX), rather than waste time invoking an empty interrupt handler repeatedly.
As shown, Square can be added to a Quick Library for use with either QuickBASIC or BASIC PDS. Unfortunately, the information link needs to patch 8087 instructions is available only with BASIC PDS. Therefore, the following file is included in the libraries on the accompanying disk, to supply the external data that LINK requires.
;FIXUPS.ASM, deciphered by Paul Passarelli FIARQQ Equ 0FE32h FJARQQ Equ 04000h FICRQQ Equ 00E32h FJCRQQ Equ 0C000h FIDRQQ Equ 05C32h FIERQQ Equ 01632h FISRQQ Equ 00632h FJSRQQ Equ 08000h FIWRQQ Equ 0A23Dh Public FIARQQ Public FJARQQ Public FICRQQ Public FJCRQQ Public FIDRQQ Public FIERQQ Public FISRQQ Public FJSRQQ Public FIWRQQ End
These values are added to the floating point instruction bytes during the linking process, and the addition converts those statements into equivalent BASIC floating point interrupt commands. For example, the 8087 statement Fld DWord Ptr [1234h] is represented in memory as the following series of Hexadecimal bytes:
9B D9 06 34 12
After LINK adds the value FIDRQQ (5C32h) to the first two bytes of this command the result is:
CD 35 06 34 12
And when disassembled back to assembler mnemonics, the CD35h displays as Int 35h. The three bytes that follow are always left unchanged, and they specify the type of operation--DWord Ptr on a memory location--and the address of that location.
At the core of any sorting or searching routine is an appropriate comparison function. Previous chapters showed how to compare string data, and as you can imagine comparing floating point values is much more complex. But now that you know how to tap into BASIC's floating point routines it is almost trivial to effect a floating point comparison. The routines that follow let you compare either single- or double precision values, by passing them as arguments.
;COMPAREFP.ASM, compares floating point values ;WARNING: This file must be assembled using /e (emulator) .Model Medium, Basic Extrn B$FCMP:Proc ;BASIC's FP compare routine .8087 ;allow coprocessor instructions .Code CompareSP Proc, Var1:Word, Var2:Word Mov BX,Var2 ;get the address of Var1 Fld DWord Ptr [BX] ;load it onto the 8087 stack Mov BX,Var1 ;same for Var2 Fld DWord Ptr [BX] FWait ;wait until the 8087 says it's okay Call B$FCMP ;compare the values, (and pop both) Mov AX,0 ;assume they're the same Je Exit ;we were right Mov AL,1 ;assume Var1 is greater Ja Exit ;we were right Dec AX ;Var1 must be less than Var2 Dec AX ;decrement AX to -1 Exit: Ret ;return to BASIC CompareSP Endp CompareDP Proc, Var1:Word, Var2:Word Mov BX,Var2 ;as above Fld QWord Ptr [BX] Mov BX,Var1 Fld QWord Ptr [BX] FWait Call B$FCMP Mov AX,0 Je Exit Mov AL,1 Ja Exit Dec AX Dec AX Exit: Ret CompareDP Endp End
Like the Compare3 function shown in Chapter 8, CompareSP and CompareDP are integer functions that return -1, 0, or 1 to indicate if the first value is less than, equal to, or greater than the second. Therefore, to use these from BASIC you would invoke them like this:
IF CompareSP%(Value1!, Value2!) = -1 THEN 'the first value is smaller than the second END IF
And to test if the first is equal to or greater than the second you would instead do this:
IF CompareSP%(Value1!, Value2!) >= 0 THEN 'the first value is equal or greater END IF
You can also use these functions from assembly language. But if you do this, I suggest a simple modification. A comparison routine meant to be called from another assembler routine would not generally return the result in the registers. Rather, it would leave the flags set appropriately for a subsequent Ja or Jne branch.
Fortunately, BASIC's B$FCMP routine already does this. Therefore, you will make a copy of the COMPAREF.ASM source file, and delete the six lines between the call to B$FCMP and the Ret instruction. You can also remove the Exit: label if you like, although its presence causes no harm. Of course, the code itself is so simple that the best solution may be to simply duplicate the same instructions inline in your routine.
Each example I have shown so far introduced another useful MASM feature. For example, you learned how MASM lets you establish data memory with an initial value, so you don't have to assign it explicitly. But there are several other features you should know about as well. One is conditional assembly.
With conditional assembly you can specify that only certain portions of a file are to be assembled. This makes it easier to maintain two different versions of a routine, for example one for near strings and one for far strings. If you had to create two separate copies of the source file, any improvements or bug fixes that you add would have to be done twice.
There are two ways that a section of code can be optionally included or excluded. One is to define a constant at the beginning of the source file, and then test that constant using a form of IF and ELSE test. Like BASIC, MASM lets you define constant values using meaningful names. The problem with this method--albeit a minor one--is that you must alter the code prior to assembling each version. The example that follows shows how this kind of conditional assembly is employed.
MyConst = 1 . . IF MyConst ;do whatever you want here ELSE ;the ELSE is optional ;do whatever else you want here ENDIF . .
The idea is that if you want the code that follows the IF test to be assembled, you would use a non-zero value for MyConst. If you wanted to create an alternate version using the code within the optional ELSE block, you would change the value to be zero.
You can also use IFE (If Equal to zero) to test if a constant is zero. And this brings up another interesting MASM feature. There are actually two types of constants you can define. The constant MyConst shown above is called a *redefinable* constant, because you can actually change its value during the course of a program. The other type of constant is defined using the Equ (Equate) directive, and may not be changed:
YourConst Equ 100
Redefinable constants are often used in repeating macros, and macros are discussed later in this section.
The other way to tell MASM that it is to assemble just a portion of the file is with IFDEF. IFDEF (If Defined) tests if a constant has been defined at all, as apposed to comparing for a specific value. The value of this approach is that you can define a constant on the MASM command line when you run it. The first example below tells MASM to assemble the code within the IFDEF block, and the second tells it to not to.
C:\ASM\> masm program /def myconst ; C:\ASM\> masm program ;
Here's the portion of the routine that is being assembled conditionally:
IFDEF MyConst ;do something optional here ENDIF
Likewise, IFNDEF (If Not Defined) tests if a constant has not been defined when reversing the logic is more sensible to you. MASM includes a great number of such conditional tests, and only by reading that section of the MASM manual will you become familiar with those that are the most useful.
Another useful MASM feature that I personally would love to see added to BASIC is multi-line comment blocks. The Comment command accepts any single character you choose as a delimiter, and considers everything thereafter to be comments until the same character is encountered. Many programmers use a vertical bar, because it is not a common character:
Comment | This program is intended to blah blah blah, and it works by loading AX with blah blah blah. |
Besides avoiding the need to place an explicit semicolon on each comment line, this also makes it easy to remark out large sections of code while you are debugging a routine.
Yet another useful feature is MASM's willingness to use either single or double quotes to indicate ASCII text and individual characters. In BASIC, if you want to specify a double quote you must use CHR$(34)--it simply is not legal to use """, where the quote in the middle is the character being defined. [With the introduction of VB/DOS triple quotes may now be used for this purpose.] If you need to define a double quote simply surround it with apostrophes like this:
SomeData DB '"' Mov AH, '"'
Or you can place a single quote within double quotes like this:
Add DL, "'"
MASM can use either convention as needed, which is a feature I personally like a lot.
Whenever MASM sees the dollar sign ($) operator it interprets that to mean *here*, or the current address. This can be used both for data and code, though it is more common with data as the example below illustrates.
.Data Descriptor DW MsgLen, Address Message DB "This is a message." Address = Offset Message MsgLen = $ - Address
The expression $ - Address tells the assembler to take the current data address, and subtract from that the address where Message begins. This is a very powerful concept because it frees the programmer from many tedious calculations. In particular, if the string contents are changed at a later time, the new length is recalculated by MASM automatically.
To assist you in manipulating data structures, MASM offers the Struc directive. This is identical to BASIC's TYPE statement, whereby you define the organization of a collection of related data items. The example below shows how to define a custom data structure using BASIC, followed by an equivalent MASM Struc definition.
BASIC: TYPE MyType LastName AS STRING * 15 FirstName AS STRING * 12 ZipCode AS STRING * 5 RecordPtr AS LONG END TYPE DIM MyVar AS MyType
MASM: Struc MyStruc LastName DB 15 Dup (?) FirstName DB 12 Dup (?) ZipCode DB 5 Dup (?) RecordPtr DD ? MyStruc Ends MyVar DB Size MyStruc Dup (?)
Like BASIC, defining a structure merely establishes the number and type of data items that will be stored; memory is not actually set aside until you do that manually. In BASIC, you must use DIM to establish the memory that will hold the TYPE variable. In assembly language you instead use DB in conjunction with the Size directive, to set aside the appropriate number of bytes.
Each component of the Structure is defined using an identifying name and a corresponding data type. Then, whenever a structure member is referenced in your assembler routine, MASM replaces it with a number that shows how far into the structure that member is located. MASM uses the same syntax as BASIC, with a period between the data name and the structure identifier. Here are a few examples:
Mov AL,[BX+MyVar.LastName] ;same as Mov AL,[BX+15] Les DI,[MyVar.RecordPtr] ;loads ES:DI from RecordPtr
In many cases you will store the variables your routines need in DGROUP using the .Data directive. As with static subprograms and functions in BASIC, this data will not change between subroutine calls. But this also means that these variables are combined into the same 64k segment that is shared with BASIC. When there are many variables or many different routines each with their own variables, this can significantly reduce the amount of near memory available to BASIC. There are two effective solutions to this problem.
One way to reduce the DGROUP impact of many variables is to place some of them onto the system stack. MASM lets you do this automatically with its Local directive, or you can do it manually by subtracting the requisite number of bytes from SP. Of course, there is only so much room on the stack, so this approach is most useful when there are many routines and each has less than 1K or so of data. Stack variables are also useful when programming for OS/2 or Windows. These operating systems require that all of your procedures be reentrant so static variables cannot be used.
The example below creates room for fifty words of local storage on the stack, and then clears the variables to zero.
Routine Proc Uses ES DI, Param1:Word, Param2:Word Sub SP,100 ;50 words = 100 bytes Push SS ;assign ES from SS Pop ES Mov DI,SP ;point DI to the start of storage Xor AX,AX ;fill with zeros Mov CX,50 ;clear fifty words Rep Stosw ;store AX CX times at ES:[DI] . ;the routine continues . Add SP,100 ;restore SP to what it had been Ret ;return to BASIC Routine Endp
MASM can also do this automatically for you using Local like this:
Routine Proc Uses ES DI, Param1:Word, Param2:Word Local Buffer :Byte Lea DI,Buffer ;clear the stack variables here . ;the routine continues . Ret ;return to BASIC Routine Endp
As you can see, Local lets you refer to the start of the local stack data area by name. Notice how Lea is required here, because the address of Buffer is expressed as an offset from BP. That is, MASM translates the Lea instruction to Lea DI,[BP-100]. You cannot use Mov DI,Offset Buffer because Buffer's address (which is based on the current setting of the stack pointer) is not known when the routine is assembled or linked.
In this case only one local block is defined, so you could also use Mov DI,SP to set DI to point to the start of the data. It is not strictly necessary to clear the stack space before using it, but it is important to understand that whatever junk happened to be in memory at that time will still be there after using Local.
It is also important to be aware of a number of bugs with the Local directive. I have found that limiting the use of Local to a single set of data as shown here is safe with all MASM versions through 5.1. Using multiple Local directives defined with data structures can result in the wrong part of the stack being written to when a structure member is accessed by name.
Another time-honored technique for conserving DGROUP memory is to place selected variables into the code segment. In most cases storing data for a routine in the code segment will make your programs slightly larger and slower, because of the need for an added CS: segment override. But when large amounts of data must be accommodated, this can be very valuable indeed. One advantage to using the code segment is that you can establish initial values for the data, which is not possible when using the stack.
As an example of this technique, I have written a string function called Message$ that stores a series of messages in the code segment. In this case only a single CS: segment override is needed, so the impact of using the code segment for data is insignificant. Message$ is designed to be declared and invoked as follows:
DECLARE FUNCTION Message$(BYVAL MsgNumber%) Result$ = Message$(AnyInt%)
Message$ is table driven, which makes it simple to modify the routine to change or add messages without having to make any changes to the function's structure. As shown here, Message$ is designed to return the name of a weekday, given a value between one and seven. You can easily modify it to return other strings of nearly any length.
.Model Medium, Basic Extrn B$ASSN:Proc ;BASIC's assignment routine .Data Descriptor DD 0 ;the output string descriptor Null$ DD 0 ;use this to return a null ; (needed for BASIC PDS only, .Code ; but okay with QuickBASIC) Message Proc Uses SI, MsgNumber:Word Mov SI,Offset Messages ;point to start of messages Xor AX,AX ;assume an invalid value Mov CX,MsgNumber ;load the message number Cmp CX,NumMsg ;does this message exist? Ja Null ;no, return a null string Jcxz Null ;ditto if they pass a zero Do: ;walk through the messages Lods Word Ptr CS:0 ;load and skip over this message's length Dec CX ;show that we read another Jz Done ;this is the one we want Add SI,AX ;skip over the message text Jmp Short Do ;continue until we're there Done: Or AX,AX ;are we returning a null? Jz Null ;yes, handle that differently Push CS ;no, pass the source segment Done2: Push SI ;and the source address Push AX ;and the source length Push DS ;pass the destination segment Mov AX,Offset Descriptor ;and the destination address Push AX Xor AX,AX ;0 means assign a descriptor Push AX ;pass that as well Call B$ASSN ;let B$ASSN do the dirty work Mov AX,Offset Descriptor ;show where the output is Ret ;return to BASIC Null: Push DS ;pass the address of Null$ Mov SI,Offset Null$ Jmp Short Done2 Message Endp ;----- DefMsg macro that defines messages DefMsg Macro Message LOCAL MsgStart, MsgEnd ;;local address labels NumMsg = NumMsg + 1 ;;show we made another one IFB
;;if no text is defined DW 0 ;;just create an empty zero ELSE ;;else create the message DW MsgEnd - MsgStart ;;first write the length MsgStart: ;;identify the starting address DB Message ;;define the message text MsgEnd Label Byte ;;this marks the end ENDIF Endm Messages Label Byte ;the messages begin here NumMsg = 0 ;tracks number of messages ;DO NOT MOVE this constant DefMsg "Sunday" DefMsg "Monday" DefMsg "Tuesday" DefMsg "Wednesday" DefMsg "Thursday" DefMsg "Friday" DefMsg "Saturday" End
After declaring BASIC's B$ASSN routine as being external, Message$ defines two string descriptors in the Data segment. The first is used for the function output when returning a normal message, and the second is used only when returning a null string. In truth, the need for a separate output descriptor and the slight added steps to detect the special case of a null output string is needed only with BASIC PDS far strings. And this brings up an important point.
It is impossible to write one assembly language subroutine that can work with both QuickBASIC and BASIC PDS far strings using the normal, documented methods. To create a string function for use with QuickBASIC and PDS near strings, you define and fill in a string descriptor in DGROUP, and assign its address in AX before returning to BASIC. And to return a far string as a function for PDS requires calling the internal STRINGASSIGN routine that Microsoft provides with PDS. STRINGASSIGN works with both near and far strings in PDS, but is not available in QuickBASIC.
The trick is to use the *undocumented* name B$ASSN, which is really the same thing as STRINGASSIGN. The big difference, though, is that B$ASSN is available in all versions of BASIC 4.0 and later. When near strings are used the B$ASSN routine is extracted from the near strings library. When linking with far strings a different version is used, extracted by LINK from the far strings library. This is a powerful concept to be sure, and one we will use again for other examples later on in this chapter.
Message$ begins by loading SI with the starting address of a table of messages. These messages are located at the end of the source file in the code segment, and each is preceded with the length of the text. Although it may not be obvious from looking at the source listing, the message data is actually structured like this:
DW 6 DB "Sunday" DW 6 DB "Monday" . .
Next, AX is cleared to zero just in case the incoming string number is illegal. Later in the program AX holds the length of the output string; clearing it here simply makes the program's logic more direct.
CX is then loaded with the message number the caller asked for. If CX is either higher than the available number of messages or zero, the program jumps to the code that returns a null string. Otherwise, a small loop is entered that walks through each message, decrementing CX as it goes. When CX reaches zero, SI is pointing at the correct message and AX is holding its length. Otherwise, the current length is added to SI, thus skipping over that data.
Notice the unusual form of the Lodsw statement, to allow it to work with a CS: override. MASM has a number of quirks that are less than intuitive, and this is but one of them. Normally you would use either Lodsb or Lodsw, to indicate loading either a byte into AL or a word into AX. But when you use a segment override MASM requires omitting the "b" or "w" Lods suffix, and you must state Byte Ptr or Word Ptr explicitly. Then, a dummy argument must be placed after the override colon.
The last new feature this listing introduces is the use of macros. The most basic use of MASM macros is to define a block of code once, and then repeat it multiple times with a single statement. This is not unlike keyboard macro programs such as Borland's SuperKey, that let you assign a string of text to a single key. For example, you could press Alt-S and SuperKey will type "Very truly yours", five Enter keys, and then your name.
MASM macros also offer many other interesting and useful capabilities, including the ability to accept arguments. [I should mention that the main point of the DefMsg macro is to make this function easy to modify, so you can create other, similar string functions from this same routine.] Before attempting to explain the DefMsg (Define Message) macro I designed for use with Message$, let's consider some macro basics.
Say, for example, you find that a particular routine needs to push the same five registers many times during the course of a procedure. To simplify this task you could define a macro--perhaps named PushRegs--that performs the code sequence for you. Such a macro definition would look like this:
PushRegs Macro Push AX Push BX Push SI Push DS Push ES PushRegs Endm
Now, each time you want to execute this series of instructions you would simply use the command PushRegs. Please understand that a macro is not the same as a called subroutine. The assembler still places each Push command in sequence into your source code each time the macro is invoked. But a simple macro like this can reduce the amount of typing you must do, and minimize errors such as pushing registers in the wrong order. And in some cases Macros also make your code easier to read.
As I mentioned, a MASM macro can accept arguments, and it can even be designed to accept a varying number of them. If you need to push three registers but which ones may change, you would define PushRegs like this:
PushRegs Macro Reg1, Reg2, Reg3 Push Reg1 Push Reg2 Push Reg3 Endm
Then to push AX, SI, and DI you would invoke PushRegs as follows:
PushRegs AX, SI, DI
Of course, a corresponding PopRegs macro would be defined similarly. Once a macro has been defined you can pass any legal argument to it. For example, you could also use this:
PushRegs AX, Word Ptr [BP-20], IntVar
Here, you are pushing AX, the word 20 bytes below where BP points to on the stack, and the integer variable named IntVar.
A useful enhancement to this macro would let you pass it a varying number of parameters. The PushM macro that follows accepts any number of arguments (up to eight), and pushes each in sequence.
PushM Macro A,B,C,D,E,F,G,H ;;add more place-holders to suit IRP CurArg, ;;repeat for each argument IFNB
;;if this arg is not blank Push CurArg ;;push it ENDIF Endm ;;end of repeat block Endm ;;end of this macro
From this you can create a complementary PopM macro by changing the name, and also changing the Push instruction to Pop.
The IRP command works much like a FOR/NEXT loop in BASIC, and tells MASM to repeat the following statements for each argument that was given. IFNB (If Not Blank) then tests each argument to see if it was in fact present in the incoming list of parameters. In this case, CurArg assumes the name of the argument, and the Push instruction is expanded to specify that name.
There is no disputing that the syntax of a MASM macro is confusing at best. Having to enclose some arguments in angle brackets but not others requires frequent visits to the MASM manual. Further, a MASM macro is virtually impossible to debug. If you write a macro incorrectly or create a syntax error, MASM reports an error at the line where the macro was invoked, rather than at the line containing the error in the macro. It is not uncommon to receive a number of errors all pointing to the same source line, with no indication whatsoever where the error really is.
Now consider how the DefMsg macro operates. DefMsg begins by defining a single incoming parameter named Message. Two local labels--MsgStart and MsgEnd--are defined, and these are needed so MASM can calculate the length of the messages. Although labels within a macro do not have to be declared as local, you would get an error if the macro were used more than once. Like BASIC, the assembler requires that each label have a unique name. By using local labels MASM generates a new, unique internal name for each macro invocation, instead of the actual label name given.
The next statement increments a MASM variable named NumMsg. To avoid an error caused by calling Message$ with an invalid message number, it compares the number you pass to the number of messages that are defined. This test occurs in the fourth line of the procedure, at the Cmp CX,NumMsg statement. NumMsg is a constant, except it may be redefined within the routine. (When a constant is assigned using the word Equate, its value may not be changed by either your source code or by a macro.) But when a variable is defined using an equals sign (=), MASM allows it to be altered as it assembles your program. Understand that the resulting number is added to your program as a constant. However, its value can be changed during the course of assembly. Therefore, each time DefMsg is invoked, it increments NumMsg. MASM places the final value into the Cmp instruction, as if you had defined it using a fixed known value.
The IFB (If Blank) test checks to see if DefMsg was given a parameter when it was invoked. In most cases you will probably want to define a series of consecutive messages. As it is used here, seven different day names are returned in sequence. But there may be times when you want to leave a particular message number blank. For example, you could create a series of messages that correspond to BASIC's error numbers. BASIC file error numbers range from 50 through 76, but there are no messages numbers 60, 65, or 66. You could therefore leave those blank, and invoke a modified copy of Message$ like this:
CALL DOSMessage$(51 - ERR)
When DefMsg is used with no argument, it merely creates a zero word at that point in the code segment. Otherwise, the length of the message is stored, followed by the message text. The statement DW MsgEnd - MsgStart is replaced with the difference between the addresses, which MASM calculates for you. This is similar to the earlier example that showed how a dollar sign ($) can simplify defining strings that may change.
The last macro I will describe here is Rept, which means "Repeat the following statements a given number of times". In the simplest sense, Rept could be used to generate a series of the same instructions:
Rept 100 Xor AX,AX Push AX Call SomeProc Endm
A Rept macro is not invoked by name; rather, it is added inline to a program (or included within a macro that is called by name). In most cases you would use a coding loop to repeat a block of code, since a Rept macro actually generates the same code repeatedly in the program. But there are situations where timing is very critical, and a loop is always somewhat slower than a sequence of inline instructions.
Another good use for Rept is in conjunction with redefinable equates, such as this example which defines the letters of the alphabet:
Alphabet: Char = 0 Rept 26 ;;do this 26 times DB "A" + Char ;;define ASC("A") + Char Char = Char + 1 ;;increment Char Endm
Although the MASM manual states that you must use double semicolons for remarks within a macro as shown here, I have used a single semicolon without problems.
There are other macro commands and features I will not describe here, because I have not found them to be particularly useful. However, macros can be recursive, multiple macros may be nested, and even redefined on the fly. I urge you to refer to the documentation that Microsoft provides for more information on those advanced features.