Issue 7 of qb:tm

issue #7

This time: Controlling Program Flow

(Editor's Note: I put up the wrong version of Petter's Part III assembly series. The code that I posted was incorrect. I apologize. You can grab a correct .txt version of the article right here.)

Hello! The fourth part of my assembly tutorial is here! The last part was really huge and covered many ways to manipulate registers and QB variables. I hope you've done some test programs during the last month to test what you've learnt. Now it's time for something new!

Until now, we've only seen simple assembly programs that basically only could manipulate numbers in different ways. As interesting this may be, it would be great to know a little more, wouldn't it?

Grab Absolute Assembly 2.1

Controlling the program flow:
In QBASIC, we're used to instructions that can control the way that snippets of code are run. I'm talking about instructions like GOTO, IF, FOR/NEXT and DO/LOOP. These kinds of instructions are essential in all program languages. Naturally, they also exist in assembler.

Many professional programmers don't like BASIC because of one particular instruction. I guess most of you knows what I'm talking about... GOTO! If you don't know the sad story of GOTO, here it is:

Before QuickBASIC was created, no BASIC compiler was procedural, i.e. they didn't support the creations of SUBs and FUNCTIONs. You had to build your program around a messy structure of jumps back and forth through the code. Some common routines could be called with GOSUB/RETURN which made it a little easier to keep up a good program structure, but sooner or later you still ended up with a really messy program that was hard to debug and update unless you planned the program very well. The term "spaghetti code" is often used to describe program code that is really messy, with jumps between lines all over the place. Most old BASIC programs looked like that, so naturally the BASIC language wasn't the choise of professional programmers. When you're coding in QBASIC/QuickBASIC you should never use GOTO and, if you can avoid it, not GOSUB/RETURN either. With a good procedural structure you will never need them, and the program won't turn into spaghetti. In assembler, you don't have that luxury. You MUST use instructions similar to GOTO and GOSUB if you want to get anything done!

Let's begin by explaining the eqivalent to GOTO:

GOTO in asm:
Whenever you need to make an unconditional jump in an assembly program, you use the instruction JMP. (JMP stands for JuMP, as you probably guessed). The general syntax is:

JMP offset

Where offset is a number that describes the offset in bytes to the byte where the machine language equivalent of the JMP instruction is located. The number can also be in a register or at a specific position in the memory. Does it sound complicated to you? Yes, I thought so! Actually, it's a pain to use JMP like this. If you want to make a correct jump, you must go through all of the asm code between the JMP instructions and the destination instruction and count the number of bytes they take up. And if you need to insert new assembly instructions in the middle of a program using JMP, you'll mess up everything and you have to recalculate all the offsets.

In DEBUG, you can always see the the memory position of every assembly instruction, so in order to make it easier to use jumps, you just type the memory position of the instruction you want to jump to, and DEBUG translates this into an offset for you. This makes it easier to use JMP, but you cannot say it has become very much easier.

One of my primary concerns when writing Absolute Assembly 2.0, the first really useful version, was to make it much easier to use JMP. So I included the support for line lables, just like the ones you use in QBASIC. Absolute Assembly and DEBUG together takes care of the translation to offset numbers for you. With Absolute Assembly, JMP is as easy to use as GOTO is in BASIC. Consider this very short program:

LineLabel: MOV AX, 1
JMP LineLabel

This program moves a 1 into AX, at a line labelled LineLabel, and then the JMP instruction makes the program go back to that line again. 1 is moved to AX again, and the same jump is performed again. As you easilly can understand, this program would result in an infinite loop if executed. Since I didn't spend too much time perfecting the label feature of Absolute Assembly, there are some limits to the ways that you can use them. Read the notes in the beginning of the program source for more information. You should use JMP carefully in order to avoid spaghetti code. Now when we're at it, let's look at another feature of Absolute Assembly:

Comments:
It's hard to understand source that other people have written. Most of the time it's also hard to understand the code that you've written yourself after a couple of weeks. Understanding assembly code, even if you wrote it yourself only an hour ago can be a nightmare! Just look at the routine that we did in the end of the last part of this tutorial series:

PUSH BP
MOV BP, SP
MOV BX, [BP+A]
MOV AX, [BX]
MOV BX, [BP+8]
MOV CX, [BX]
ADD AX, CX
MOV BX, [BP+6]
MOV [BX], AX
POP BP
RETF 6

Do you remember what it did? Can you instantly explain how it works? I can't. Due to this problem, I knew that it was necessary to allow comments in Absolute Assembly. In BASIC, the "'" sign, or the older REM instruction can be used for commenting. In the most popular assembler's, like Microsoft's MASM and Borland's TASM, (I'll get back to them in another part of this tutorial series) the semicolon, ";", is used for comments, so I made this an Absolute Assembly standard too. Commenting assembly code is as simple as commenting BASIC code. Let's try:

; Assembly routine that adds two integer variables together:
PUSH BP ; Allow the reading of variables from QBASIC:
MOV BP, SP
MOV BX, [BP+A] ; Move variable 1 into AX:
MOV AX, [BX]
MOV BX, [BP+8] ; Move variable 2 into CX:
MOV CX, [BX]
ADD AX, CX ; Add AX and CX together and put the result in variable 3:
MOV BX, [BP+6]
MOV [BX], AX
POP BP ; Get back to QB:
RETF 6

Wow! What a difference a few comments can make, right? Now the purpose and the basical functions of the routine are clearly explained. The details of each operation can now be understood by examining very few lines of code. This commented version of the addition routine would be correctly handled by Absolute Assembly. It just ignores everything written on a line after a semicolon has been encountered. Now, let's get back to those jumps!

CALL and RET:
Instead of using only JMP to jump around in your asm code, you can use the assembly instructions CALL and RET. They are the asm equivalents to GOSUB and RETURN in BASIC. So if you have a routine that needs to be used several times in your code, you can put it in a subroutine in the end of your asm code and use CALL to get there. The syntax for CALL is:

CALL address

Just like with MOV, you would have to calculate the memory offset to the asm instruction that you want to jump to, but with Absolute Assembly all you have to do is to specify a line label.

When a CALL is executed, some things happen that you maybe will find interesting: First of all, the CPU pushes the offset address of the asm instruction after CALL on the stack, thus reducing the SP register by two. This is important to know if you're using the stack in the program. Then, the IP register, which always contains the offset address of the current machine language instruction being executed, is changed to the offset address of the assembly instruction you want to jump to. The IP register cannot be changed manually. The only way to modify it is to use JMP, CALL or similar instructions.

When you've jumped to a subroutine using CALL and want to get back again, you must use the instruction RET, short for RETurn. RET will pop the address of the instruction after the CALL instruction back from the stack and change IP. The next assembly instruction to be executed is the one after CALL. The syntax for RET is simply:

RET number

The number can usually be left out, but if you have pushed additional numbers on the stack inside the subroutine, you can let the RET instruction pop them away for you.

If you look at the example program used in the description of JMP above, you can see the instruction RETF 6. RETF works just like RET, but with the difference that it returns from a FAR call, i.e. a call that has been made from another segment address in the memory. It's possible to specify a full memory address, containing both segment and offset after CALL, but we won't need to do that in Absolute Assembly programs. However, I strongly suspect that QBASIC executes such a CALL instruction when you use CALL ABSOLUTE. Far calls require that both the segment and offset of the next asm instruction are pushed on the stack, so also the CS register, containing the segment address of the instruction currently being executed, is pushed. That's four bytes instead of two, and that's why you have to use RETF instead of RET. The number 6 after RETF pops away 6 extra bytes from the stack. That's the three integer variables that was passed from the BASIC program.

This program is a simple, useless example of using CALL and RET:
(The numbers to the right specify in which order the instructions are executed)

PUSH BP ; 1
MOV BP, SP ; 2
MOV AX, 1 ; 3
CALL Subroutine ; 4
MOV CX, 3 ; 7
POP BP ; 8
RETF 2 ; 9

Subroutine:
MOV BX, [BP+6] ; 5
RET ; 6

Just as a reminder: The four lines that was presented to you earlier in this tutorial series as a base for all your assembly routines are not necessary if you don't pass any BASIC variables to the routine. Thus, three of the four lines below won't be necessary:

PUSH BP
MOV BP, SP
POP BP
RETF x

The only important instruction is RETF. You don't need the others. All right, now on to something else.

Conditional jumps:
It's very likely that your assembly routines sometimes has to do different things depending on, say, the numbers you've put in the variables you pass to them. In QBASIC, you frequently have to use IF/ELSE or SELECT CASE to control such things. How can we do this in assembler? The answer is: Through conditional jumps!

If you want to control the program flow depending on input data, you can use the assembly instruction CMP, which is short for CoMPare. The syntax is:

CMP destination, source

Earlier parts of Petter Holmberg's assembly series can be found in the Archive. They are in issues 4, 5, and 6.

The destination and source can be registers, immediate numbers or memory pointers.

CMP actually works a little like SUB. It subtracts the source from the destination. The difference is that it doesn't store the result in the destination like the SUB instruction would do. What's the use of it then? Well, something very important actually happens, but you can't see it. Now I have no choice but to present another new feature of assembler to you: The flags.

The flags are a very important part of assembly programming, even though it's something you rarely have to worry about. The flags are all located in a register, and that register is simply called the FLAGS register. This register is different from all of the others, because its individual bits all have separate tasks, and they are very important for the execution of a program. Each bit in the FLAGS register is called a flag, and they all have names. I'm not going to present them to you here because you'll never need to use most of them, but the important thing to know is that, almost every assembly instruction modifies some of the flags in different ways. One example is SUB. There's one flag called the Sign Flag, SF, and it will be set to 1 if the result of the subtraction gets negative or 0 if it gets positive. One of the few times you really use of the FLAGS register is when you push or pop it. PUSH FLAGS and POP FLAGS are valid instructions, and they're used when you need to preserve the state of the flags to a later time. So, even though CMP won't store the result of a subtraction, it will modify the flags in the same way that SUB would do. What use can we have of this then? Well, here comes the answer:

There are a number of assembly instructions that can be used to perform conditional jumps in the code. Their names all begin with a J, for Jump.

Here are the most common ones:

Name: Description:
----- ------------
JB    Jump if Below
JBE   Jump if Below or Equal
JE    Jump if Equal
JAE   Jump if Above or Equal
JA    Jump if Above
JL    Jump if Less (signed)
JLE   Jump if Less or Equal (signed)
JGE   Jump if Greater or Equal (signed)
JG    Jump if Greater (signed)
JNB   Jump if Not Below
JNBE  Jump if Not Below or Equal
JNE   Jump if Not Equal
JNAE  Jump if Not Above or Equal
JNA   Jump if Not Above
JNL   Jump if Not Less (signed)
JNLE  Jump if Not Less or Equal (signed)
JNGE  Jump if Not Greater or Equal (signed)
JNG   Jump if Not Greater (signed)

How do these instruction work then? Well, the general syntax for all of them is:

Jxxx linelabel

The linelabel thing works just like it does with JMP.

The idea is that you should use CMP to compare two operators, and then use one of the conditional jump instructions to go where you want to go depending on the state of the flags that a CMP between the two operators changed. Let's try an example!

Suppose you have two numbers in AX and BX. If the number in AX is greater than the one in BX, CX should be set to 1. If not, CX should be left unchanged. Then you could use this code snippet to test it:

.
.
.
CMP AX, BX ; Compare AX against BX.
JBE NotGreater ; IF AX is below or equal to BX; skip the next line.
MOV CX, 1 ; Set CX to 1. (This line is only executed if AX > BX)
NotGreater: ; The label used for the skipping of the prevoius line.
.
.
.

Get it? Now CX will only be changed if AX is greater than BX, because if it's below or equal to BX, one line will be skipped.

As you can see, I've put a space before the MOV CX, 1 instruction. I usually do this when writing conditional jump code in asm, just to make it look more like in QBASIC where you often do this between IF and END IF. This is just one of my tricks to make asm code more readable so you don't have to care about it.

There's another thing you may be wondering about. In the list of jump instructions above, some of the descriptions have the comment "(signed)" in them. As I mentioned briefly in the previous tutorial, a signed number is the same as a negative number. I'll wait with the explanation of how negative numbers are stored, but it's important that you know when to use what jump instruction. If you want to compare two numbers where one or both of them are negative, you must use a conditional jump instructions that can handle signed numbers. So instead of using JA (Jump if Above), you use JG (Jump if Greater) and so forth. The JE and JNE instructions works with all types of numbers. I promise to explain the nature of signed and unsigned numbers later, and then you'll understand why you need so many different jump instructions.

Let's try another example just for the sake of clarity: Consider the following code snippet:

.
.
.
CMP AX, 0
JL Negative
MOV BX, CX
JMP EndOfTest
Negative:
MOV BX, DX
EndOfTest:
.
.
.

What this code snippet does is the following: It first tests if AX is 0. If it is less than zero, i.e. if it's negative, BX will be set to the value in DX. If it wasn't negative, no jump will occur and BX will get the value in CX instead. But in order to avoid setting BX to DX right after that, which would destroy everything, an unconditional jump to the code after that line must be made. It's a bit ugly, but that's the only way to do it. If it makes you feel any better you can think of the "JL Negative" instruction as IF, the "Negative:" label as ELSE, and the "EndOfTest:" label as END IF.

There's another assembly instruction that can be used for conditional jumps: TEST. The syntax for TEST is:

TEST destination, source

TEST is used exactly like CMP and for the same purpose. The difference is that CMP performs a subtraction between the two operands, but TEST performs an AND between them. This can be useful if your conditional jumps depends on the bit settings of the operands instead of the value of the whole operand.

Loops:
Another important feature of QBASIC and other high-level languages is the ability to execute a code snippet repeatedly, a feature knows as looping. In QBASIC, you have the FOR and NEXT instructions for loops that you want to perform a certain number of times, and DO/LOOP and WHILE/WEND for loops that should run until something special occurs. Slow loops are one of the major reasons of slow program execution, and one of the major reasons to use assembler in QBASIC is to speed up things. Therefore, you'll soon discover that the most important parts of your program to rewrite in assembler often are the loops.

You already know how to perform DO/LOOP type of loops in assembly. You can use JMP together with conditional jumps, like this:

.
.
.
XOR AX, AX ; Set AX to 0.
StartOfLoop:
INC AX ; Increase AX.
CMP AX, 10 ; If AX is 10: Get out of the loop.
JE EndOfLoop
JMP StartOfLoop ; Jump back to the start of the loop.
EndOfLoop:
.
.
.

This example would increase AX ten times and then continue the program. Note how I use three spaces before the instructions inside the loop here, just like loops are usually written in QBASIC. This is just another way of making the code more readable. Feel free to invent your own tricks if you don't like mine. (This article is formatted to html, which unfortunately makes 3 blank spaces a wicked pain to code, which is why you'll notice only one space- editor)

Although this is an acceptable way of performing loops in asm, there is a special loop instruction that does the job even better. Let's take a look at it!

The special loop instructions I'm talking about are made for the FOR/NEXT type of loops, i.e. loops that you want to run a certain number of times. The CX register plays an important role here. It's the register used to store the number of times a loop should be executed.

The instruction used to perform loops is... LOOP! The general syntax is:

LOOP Label

The Label operator works the same as with other jump instructions. Here's an example of using LOOP:

.
.
.
MOV CX, 10 ; Loop ten times.
StartOfLoop: ; This label specifies the start of the loop.
ADD AX, 4 ; A useless asm instruction.
SUB BX, DX ; Another useless asm instruction.
LOOP StartOfLoop ; Jump back to the StartOfLoop label.
.
.
.

This example demonstrates how you use the CX register and LOOP to make a set of asm instructions execute a certain number of times. You just put the number specifying how many times you want to execute the loop in the CX register. When the LOOP instruction is executed, the number in CX is decremented by one (without touching the stack), and if the result is above zero, a jump back to the specified label is performed. This means that the lines between the line label you jump to and the JUMP instruction will be executed the same number of times as the number you set CX to before the loop. The only bad thing with this loop technique is that the CX register is occupied as long as you're inside the loop. You can use CX for other stuff, but then you would have to save its value somewhere else (the stack for example) and restore the value before the LOOP instruction.

Limits of jumps:
All of the asm instructions performing jumps (JMP, CALL, LOOP and so on) have a limit: The jumps cannot be too long. What I mean is that you can't jump across too many assembly instructions. Or actually, the limit depends on the number of bytes the machine language equivalents to the assembly instructions takes up. The limit is between 128 bytes upwards to 127 bytes downwards. It's not easy to know how many bytes your assembly instructions occupy, but it generally varies between one and five. An instruction such as PUSH AX only takes up one byte, but an instruction like MOV [BP+10], BX needs three bytes. Usually this limit is nothing to worry about. 100 bytes are a lot of assembly code. I've never had a problem with this limit.

Memory transfers fast and easy:
I thought we should make a really cool example of assembler in QBASIC this time. But for that example, we need to know three new assembly instructions. There are lots of asm instructions for handling strings. I'm not going to discuss all of them here, but there are three of them that are really usefull: LODS, STOS and MOVS. Their respective syntaxes are:

LODSx
STOSx
MOVSx

As you can see, there are no operands or anything. The "x" should be replaced with either a B for Byte or a W for Word. The first instruction, LODS, works like this: If you use LODSB, a byte from the address pointed out by DS:SI will be loaded into AL, and SI will be increased by one. (Actually it may be decreased instead if the Direction Flag, DF, is set.) LODSW works in the same way except that a whole word (two bytes) will be loaded into the entire AX register and SI will be increased (or decreased) by two. STOS is the opposite of LODS: It copies the value of AL/AX to the address pointed out by ES:DI, and increases/decreases DI by one or two depending on if you use STOSB or STOSW.

MOVS is a combination of LODS and STOS. A byte or a word is loaded from DS:SI, but it doesn't go to a register. Instead it is copied to ES:DI and both SI and DI are incremented. MOVSB/MOVSW can therefore be used to transfer bytes from one position in the memory to another without touching the reguisters, something you cannot do with the standard MOV. We're going to use MOVS in the example routine I'm now going to present to you.

An example program:
To summarize this part of my assembly tutorial, we're going to make a really cool example program. You may not have realized it, but you already have a great knowledge of assembly programming. If you don't believe me, you'll only have to look at this example:

What we're going to do is an assembly version of the popular QBASIC instruction PUT. I'm not talking about the file handling version, but the graphics instruction used to put a sprite on the screen.

We're only going to use SCREEN 13 for this example, since this screen mode is really easy to use. The whole screen is built up by a 320 * 200 pixel bitmap with 256 possible colors. Each pixel occupies one byte in the VGA memory, and that memory starts at the address A000:0000h. The first byte contains the index of the pixel at coordinates 0,0, the second one of the pixel at 0,1, the 320th one of the pixel at 1,0 and so forth.

The sprite is stored in a QBASIC array, where the first word specifies the width of the sprite times 8. The second word specifies the height of the sprite, and then the pixel indexes comes as bytes stored in the same way as the screen 13 bitmap.

It would be nice with a routine that was a little better than PUT. Of course it will be faster than PUT, but let's add another feature. Many times you wish that you could draw sprites with an "invisible" color. This means that one of the colors in the sprite won't be drawn on the screen. With this feature, your sprites can have irregular edges and "holes" in them because a certain color will be skipped when the sprite is drawn on the screen. We'll use color 0 as the "invisible" color.

Before we start writing the asm code, we'll make a demo program in QBASIC to test it with:

' Demonstration of using assembler in QBASIC to put a sprite in SCREEN 13
' much faster than with PUT and with an invisible color.
SCREEN 13
' Initialization of the assembly routine:
' (Here we're goint to put stuff later on.)
DIM testsprite%(513)
DIM background%(513)
LINE (0, 0)-(31, 31), 32, BF
LINE (4, 4)-(27, 27), 0, BF
LINE (8, 8)-(23, 23), 40, BF
LINE (12, 12)-(19, 19), 0, BF
GET (0, 0)-(31, 31), testsprite%
' Demonstrate the routine:
CLS
LINE (0, 24)-(319, 199), 32
LINE (0, 199)-(319, 24), 40
spritex% = 144
spritey% = 96
COLOR 40
PRINT " Demo of PUT routine in assembler:"
COLOR 32
PRINT "Use the cursor keys to move the sprite."
PRINT "Pressing Escape exits the demonstration."
GET (spritex%, spritey%)-(spritex% + 31, spritey% + 31), background%
' (Here we're going to make a call to the asm PUT routine later.)
DO
key$ = INKEY$

SELECT CASE key$
' Up:
CASE CHR$(0) + "H":
PUT (spritex%, spritey%), background%, PSET
IF spritey% > 30 THEN spritey% = spritey% - 10
GET (spritex%, spritey%)-(spritex% + 31, spritey% + 31), background%
' (Here we're going to make a call to the asm PUT routine later.)
' Down:
CASE CHR$(0) + "P":
PUT (spritex%, spritey%), background%, PSET
IF spritey% < 158 THEN spritey% = spritey% + 10
GET (spritex%, spritey%)-(spritex% + 31, spritey% + 31), background%
' (Here we're going to make a call to the asm PUT routine later.)
' Left:
CASE CHR$(0) + "K":
PUT (spritex%, spritey%), background%, PSET
IF spritex% > 10 THEN spritex% = spritex% - 10
GET (spritex%, spritey%)-(spritex% + 31, spritey% + 31), background%
' (Here we're going to make a call to the asm PUT routine later.)
' Right:
CASE CHR$(0) + "M":
PUT (spritex%, spritey%), background%, PSET
IF spritex% < 278 THEN spritex% = spritex% + 10
GET (spritex%, spritey%)-(spritex% + 31, spritey% + 31), background%
' (Here we're going to make a call to the asm PUT routine later.)
END SELECT
LOOP UNTIL key$ = CHR$(27)

That's it! Let's save this code in the file PUTDEMO.BAS.

Now we need to figure out what input values the asm routine needs:

First of all we need to pass to the assembly routine the x and y coordinates of the screen where we want the sprite to be drawn. The asm routine also need to know where in the memory to find the sprite. The sprite should be stored in an array, and all arrays start with the offset address 0 in the memory, so we just need to pass the offset address. The asm routine also need to know the dimensions of the sprite, but that's stored in the sprite data so we can obtain these values inside the assembly routine itself. So, let's make the call to it look something like this:

CALL ABSOLUTE(BYVAL x%, BYVAL y%, BYVAL VARSEG(testsprite%(0)), SADD(asmput$))

Now we can start typing in the asm code in our favourite text editor. Let's take it step by step! First we want to allow the reading of QBASIC variables:

; A PUT routine with clipping:

PUSH DS ; Push DS and BP and move SP into BP.
PUSH BP
MOV BP, SP

As you can see, we do not only push BP, but also DS. It's necessary to preserve the value of DS in all assembly routines called by QBASIC if you're going to use it, or else your computer will crash.

Now we need to point DS:SI to the start of the sprite in the memory: This requires that we get the segment address of it from QBASIC. Where in the stack can we find it? Well, first we remember from the last part of this tutorial series that QBASIC first pushes the variables we passed to the routine in left to right order and then pushes four extra bytes on the stack, so the computer knows how to get back to the BASIC program. Then we push DS and BP on the stack ourselves. Since the stack grows downwards, this means that the last variable would be found at BP+2+2+4=BP+8, the middle one at BP+2+2+4+2=BP+10 and the first one at BP+2+2+4+2+2=BP+12. Since DEBUG treats all numbers as hexadecimal, we will have to define the stack offsets of the variables in the following ways:

x% = BP+0C
y% = BP+0A
VARSEG(testsprite%(0) = BP+08

Calclulating stack offsets like this may seem a little tricky, but you'll get the hang of it after a while. Now, let's load the address of the sprite into DS:SI!

MOV BX, [BP+08] ; Get the segment address of the sprite.
MOV DS, BX
XOR SI, SI ; Set SI to 0.

Now we know where to begin fetching the data. But where should we put it? In order to determine the correct position on the screen, we need to know the pixel co-ordinates of the upper left corner of the sprite. These numbers should've been passed to the routine and pushed on the stack. It's just to go and get them:

MOV DX, [BP+0C] ; DX = X coordinate.
MOV AX, [BP+0A] ; AX = Y coordinate.
MOV BX, AX ; BX = AX = Y coordinate.

And why do I load the y co-ordinate into two registers, you ask? Since the screen pixels are stored in the memory from left to right, row by row from the top to the bottom, the correct memory position to begin moving data to depending on the coordinates x and y is: A000h:y * 320 + x. As you can see, the offset calculation includes a multiplication. Even though this only needs to be calculated once per call to the routine, making the use of MUL despite its slowness acceptable, we're going to use another method which is much more interesting. Remember from the last part of this tutorial that you could use shift instructions for multiplications and divisions if the number to multiply or divide by was in the series of numbers defined as 2^x, like 2, 4, 8, 16, 32, 64, 128, 256 and so on. 320 isn't one of these numbers, but it can be expressed as the sum of two of them: 320 = 256 + 64. This suggests that if we multiply the y coordinate first with 256 and then with 64 and add the results together, it will be the same as if we multiplied it with 320 in the first place. And that is certainly true! So with two shifts and one addition, we can perform a multiplication with 320 really fast:

MOV CL, 8 ; Multiply y with 256.
SHL AX, CL
MOV CL, 6 ; Multiply y with 64.
SHL BX, CL
ADD AX, BX ; Add the results together.

Now you can see why the y coordinate needed to be copied to more than one register. We need the value twice.

Now we have y * 320 stored in AX and x stored in DX. Now we only need to add them together to get the correct memory offset for the sprite. The segment address should be set to A000h, and then we have the correct destination memory address in ES:DI:

MOV BX, A000 ; ES = A000h.
MOV ES, BX
ADD AX, DX ; DI = y * 320 + x.
MOV DI, AX

All right. Now we have DS:SI pointing at the start of the sprite data and ES:DI pointing at the screen memory position where the data should be written.

Now it would be nice to know the width and height of the sprite. This is stored in the beginning of the sprite data. First comes the width of the sprite. We load the first two bytes of the sprite data into AX, using LODSW. The number we get is eight times bigger than the actual width. Therefore we need to do a division by 8 to get it right. Luckilly, this division can be handled with a shift instruction. Then we move the value to BX, where it will be stored:

LODSW ; Get width info from sprite data.
MOV CL, 3 ; Divide the number by 8 to get correct width.
SHR AX, CL
MOV BX, AX ; Store the width in BX.

Next comes the height. It's simple to retrieve it from the sprite data, since it comes after the width and is stored in the correct form. We won't need any shifts to correct it. We won't support sprites higher than 200 pixels, so only one byte needs to be stored. Let's keep the height in AH. AL must be left free for later:

LODSW ; Store the height in AH MOV AH, AL

Finally, we need to know what 320 - the sprite width is, because this number will be used in the drawing loop. This is simple to do. We can use DX to store this number, and the sprite width is already in BX. So all we need to do is this:

MOV DX, 140 ; DX = 320 - Sprite width
SUB DX, BX

Now we have loaded all the data we need. We still haven't put anything in CX, and that's good because we're going to need it in the drawing loop. AL will also be used, since we use LODSB in the code.

Let's take a look at the registers and see how many we have left:

DS = Source segment
SI = Source offset
ES = Destination segment
DI = Destination offset
AH = Sprite height
AL = Reserved for drawing loop
BX = Sprite width
CX = Reserved for drawing loop
DX = 320 - Sprite width

Phew! We made it without having to use the stack to stuff away data. All of the basic registers are used. If we would have had to load more information, using the stack would have been inevitable.

OK! Now it's time for the main loop. Actually it has to be two loops inside each other. I'll show it first and explain it later:

Yloop: CMP AH, 0 ; Stop drawing if y is zero.
JE EndOfDrawing
MOV CX, BX ; CX = Sprite width.
XLoop:
LODSB ; Load a pixel from DS:SI into AL.
CMP AL, 0 ; Is the pixel color 0?
JE SkipPixel
STOSB ; No: Copy the pixel in AL to ES:DI.
DEC DI
SkipPixel:
INC DI ; Yes: Increase DI by one.
LOOP XLoop
ADD DI, DX ; Move screen memory pointer to next line.
DEC AH ; Decrease height.
JMP YLoop
EndOfDrawing:

POP BP ; Return to QBASIC.
POP DS
RETF 6

All right! What this loop does is the actual drawing process. First we test if the height of the sprite (a number located in AH) is 0. If it is, we will instantly jump out of the loop and return to QBASIC since there's nothing to draw. If it's over 0, CX will be loaded with the width of the sprite. Now we can load the first pixel from the sprite into AL, using LODSB. Remember that this also increases SI by 1. We test to see if this pixel is 0, the invisible color. If it is, we increase DI by 1, thus skipping a pixel on the screen. If it isn't 0, we use STOSB instead to copy the pixel to the screen and decrease DI by 1 to compensate for the increse that comes below it. Then we use LOOP to repeat this process. Each time CX will be decreased until it reaches 0. When this first happens, the first column of the sprite has been drawn. Now we continue below the LOOP instruction, where 320 - the sprite width is added to DI. This ensures that the next column will be drawn in the correct position on the screen. Finally we decrease AH containg the sprite height by 1 and return to the start of the outer loop. This process will of course continue until AH is 0, and the sprite drawing will be complete! All that's left to do is to exit from the routine, returning us to the demo program.

Now paste together the pieces of code we've collected and save it in a file called ASMPUT.ASM. Then run Absolute Assembly. Make ASMPUT.ASM your assembly source file, PUTDEMO.BAS your QBASIC destination file, make sure it appends the code to PUTDEMO.BAS instead of erasing its original contents and skip the adding of call absolute code. We can do that ourselves. Now start QBASIC and make sure the asm code was added to the program. Take the added code and move it up to the beginning of the demo program, replacing the line saying: ' (Here we're going to put stuff later on.) Then, you look up the five places saying: ' (Here we're going to make a call to the asm PUT routine later.) and replace them with the following lines:

DEF SEG = VARSEG(asmput$)
CALL ABSOLUTE(BYVAL spritex%, BYVAL spritey%, BYVAL VARSEG(testsprite%(0)), SADD(asmput$))
DEF SEG

And that's it! Test the program and see the asm code we've written in action!

There are some limits to this sprite drawing routine though: It won't work in any other screen mode than SCREEN 13, because the screen memory works differently for the other modes, and the routine lacks border checking, i.e. If you put the sprite on a position where parts of it is "outside" the screen borders, it will not be drawn correctly. If you feel like it, you can try writing a sprite routine that "clips" the sprite correctly at the screen edges. But anyway, it draws sprites in a way that the standard PUT cannot do, and it's much faster too! Neat, huh? :-)

Oh no! I've done it again! These tutorial parts are growing for each new month :-) Last time I promised to cover a bit more than I actually did in this part, but the things that you've just read are enough for now. Now go and experiment with assembly flow control on your own and see if you can come up with something cool! We've come a long way now, but basically we still only know how to do clever calculations and memory transfers in asm. This can be used for much, but there's still more to learn. The next time I will introduce some ways to communicate with the different parts of your computer in assembler. This is the part of programming that is usually refered to as... I/O!

Happy hacking everyone!

This tutorial originally appeared in QBasic: The Magazine Issue 7.