Assembly language programming tutorial part 2: The basics By Petter Holmberg of Enhanced Creations Edited version (original version posted in QB:tm) Hello again! The first part of this tutorial was written in a hurry, but this one wasn't, so I hope you will find this one better. Last time I discussed the history of assembler and told you where to use it and not. I also gave you some information about the binary and hexadecimal system, and briefly explained how the base memory is addressed. This background information was needed to give you a good start in the learning process. This time I will teach you the basics of the assembly language and show you how to use it in QuickBASIC. Let's get to it! How the heck can I execute assembly code in QuickBASIC? This might be the first question you're asking yourselves. How can you make QuickBASIC understand assembly code? Well, you can't. QuickBASIC can only understand regular BASIC expressions. However, it is possible to make QuickBASIC execute snippets of machine language code in a program. Machine language is the only language the processor really understands, and the BASIC code you usually see is translated into machine language instructions when you run the program. But as assembly code basically is machine language represented in a more humane way, all you need is a program that translates your assembly code into machine code, and the knowledge on how to get QuickBASIC to run that machine code. When converting a program written in a high-level language such as QuickBASIC to machine code, you say that you compile a program. When you do the same with an assembly program, you say that you assemble the program. A program that does this is called an assembler. Do not mix up the expressions here! Now you may be thinking that you don't have an assembler on your hard drive, but that's where you're wrong. All Microsoft operating systems, from MS-DOS to Win98 have a program called DEBUG somewhere. This program was included in Microsoft OS:s as a tool for advanced users, and it has the possibility to convert raw assembly code to machine code and the reverse. This program can be very useful, but it's also very hard to use. Luckily, you won't need to worry about that. I will explain this later. There are two ways to run machine code in QuickBASIC. I will start by only explaining the first one. QBASIC and QuickBASIC both have a built-in function called CALL ABSOLUTE. With CALL ABSOLUTE you can execute a machine language routine and then return to QuickBASIC. If you use the standard QBASIC, you can use CALL ABSOLUTE directly, but in QuickBASIC it's included in the external library QB.QLB/QB.LIB. So if you use QuickBASIC, you must start it with the syntax QB /L, which includes this library. To begin with, we will work with DEBUG and CALL ABSOLUTE as our tools to learn assembler. But as I told you DEBUG is hard to use, so I have created a program that makes everything a whole lot easier. It is called Absolute Assembly, and it can be downloaded at the Enhanced Creations website at http://ec.quickbasic.com. The last version was written many months ago, but it works fine. This program releaves you from the pain of using DEBUG manually to create a program. It takes a raw text file with assembly code as input, and through DEBUG generates a snippet of QuickBASIC code in a file of your choice. I'll give you the details as we continue. The basics of assembler: Now we are ready to begin discussing some serious stuff: The first steps into the asm world! First of all: When working with assembler, you mainly process a lot of numbers. These numbers needs to be stored somewhere. You can of course use the memory to store your data, but there's another way to do it: Through registers. Registers are a bit like variables, but they're not stored in the memory as variables are. They are stored in the microprocessor, where they can be accessed instantly and effectively. However, there are only a few of them, so you'll have to use them carefully and keep track of what you're doing with them. Many registers also have special uses, so you can, and must, use them only in certain places. This may seem a little confusing to you right now, but you will soon understand how it works. There are four basic registers that can be used for almost anything. They are called AX, BX, CX and DX. You can think of these registers as INTEGER variables in QuickBASIC. They are small memory cells that can store a 16-bit number. If you want to use only one of the two bytes in these registers, you can do so by calling them AH/AL, BH/BL, CH/CL, and DH/DL. The H and L stands for "high" and "low". So you can use only the upper 8 bits of the AX register by calling it AH, and the lower 8 bits by calling it AL. On 386 computers and later, you can also call these registers EAX, EBX, ECX and EDX, and use them to store 32-bit large numbers. I'll better draw this to make you understand it: <--- 32 bits ---> -------------------------------- | EAX | --------------------------------| | AX | ----------------| | AH | AL | ---------------- < -- 16 bits -- > Writing data into AL doesn't affect AH, but it affects AX and EAX. The same rules goes for BX, CX and DX. Remember that this is all just different names to access different parts in the same register. As DEBUG cannot handle any 386 or above processor instuctions, we won't be using 32 bits registers as long as we're working with it. Although these four general purpose registers can be used for almost everthing, they also have special purposes. The A, B, C and D in the registers actually stand for Accumulator, Base, Counter and Data. I will tell you when they should be used when we come to such situations. There are many other registers with more special uses that you will need to learn, but I will return to them later when we need them. Now we're going to learn our first assembly instruction! MOV, your key to data transfer: The most common and important assembly instruction is called MOV. Assemblers on most platforms have this instruction. It's purpose is to move or copy values between memory and registers. As you probably guessed, MOV is short for move. Most assembly instructions are three letters long. The name is a little misleading, because moving a value would mean taking it away from the source, but MOV actually copies the value. The general syntax for MOV is: MOV destination, source Beginners always tend to mix up the positions of the source and destination with MOV. It may seem more natural to put the source first, but if you think about it you'll see that you do the same in BASIC. (destination = source) The source can be a direct number, a register or a value in the memory. The destination can be a register or a memory position. Here are some exaples: If you want to put the value 8 in the AX register, you type: MOV AX, 8 If you would like to copy the contents of the CH register into BL, you type: MOV CH, BL A thing that you cannot do is to move a 16-bit value into an 8-bit register. An instruction like MOV AL, BX is therefore not possible. But what about values in the memory? Well, then you must learn to use three new registers: DS, SI and DI. Accessing the memory: In order to be able to read and write in the memory, you need the special memory addressing registers. If you want to read data from the memory, you must put the memory address into the two registers DS and SI. Their full names are the Data Segment register and the Source Index register. In DS you should put the segment address of the memory position you want to read from, and in SI you should put the offset address. Segments and offsets was explained in part 1 of this tutorial series. Now, suppose you want to copy byte number 18 in the memory into AL. Then you need the following assembly code: MOV BX, 1 MOV DS, BX MOV SI, 2 MOV AL, [SI] This requires some explaining. First of all: It's impossible to move values into the DS register directly. Don't ask me why, but it has to do with the Intel PC processor architecture. So what you need to do is to put a value in one of the general purpose registers, here I use BX, often used in this situation, and then I copy the contents of that register into DS. That explains the first two lines. Next we put a value into the SI register. Luckily this can be done directly. We wanted memory position 18, and now the DS register is 1 and the SI register is 2. Remember that the actual memory position is the segment times 16 plus the offset. In our case this gives us 1*16+2=18. The final line copies the byte located at this memory position into AL. The brackets [] around SI means that the computer should fetch the byte at the memory position pointed out by SI, (DS is automatically assumed to hold the correct segment address) instead of the value in SI itself. Without the brackets, the value of SI, (1), would get into AL. Well, actually that wouldn't work because AL is only 8 bits and SI is 16 bits. If it had been the whole AX register it would work though. But with the brackets, the value from memory position DS * 16 + SI is read. If you want to do the opposite; write to memory, you use DI instead of SI. DI is short for Destination Index. Writing data is done in almost the same way as reading data. If you would like to write the value 5 into the same memory position as we were reading from in the previous example, you type this: MOV BX, 1 MOV DS, BX MOV DI, 2 MOV AL, 5 MOV [DI], AL Easy, huh? Here the DS register is also used for the segment address. In this example, AL, an 8 bit register part, is used, and so 8 bits will be written to the memory. If you had changed AL to AX, 16 bits would have been written. Now you should know how to read and write values from/to registers and how to access specific memory positions. But you can't do much with it yet. Now it's time to learn how to call asm routines from QB! The interface: Now you need to have both DEBUG and Absolute Assembly, the two programs I talked about earlier. You won't have to know how to use DEBUG, because Absolute Assembly will do all the dirty work for you. You can use your favorite text editor when creating your assembly routines. Just make sure the code is saved in a file of the standard ASCII .TXT format. When you want the code to be fed into QB, you just start Absolute Assembly. You will be asked to type in the name of the text file containing the asm source, the QB program file to put the code in, and the name of a code string. Make sure you have saved the BAS file in standard ASCII format (this only applies to QuickBASIC users). The code string is a string variable name that is going to be used in the QB program to access the code. If you're making an asm routine to draw pixels on the screen, you should call it drawpixel$ or something like that. You don't have to type in the $ sign in Absolute Aseembly. Next, you will be asked to answer yes or no to two questions. The first one asks you if you want to append the code to the basic source file. If you answer no, your BAS file will be cleared before the code is written to it. If you answer yes, the ASM code will end up in the bottom of the file without erasing its old contents. The other question is if you want to add CALL ABSOLUTE lines to the program. This is a little hard to explain right now. I'll get back to it soon. Our first assembly routine: It's time to try writing an asm routine for QB. The first test routine won't do anything, it's just a test. First, start your favorite text editor and type in the following lines: PUSH BP MOV BP, SP POP BP RETF Now you wonder what this means, but don't worry! I will explain it to you. The first line is a new assembly instruction for you: PUSH. This introduces a new part of assembly programming. PUSH is an instruction used to put values on the stack. What is the stack then? Well, it's a part of the memory that can be used to store temporary values. You will often have the need to keep track of more numbers than there are registers, and the most convenient way to go then is to use the stack. The stack is a place where you can shuffle away values until you want to use them. PUSH is the instruction you use to copy a value to the stack. When you want to get the value back, you use the opposite of PUSH, an instruction called POP. You can see that POP is used on the third line of the program. There's a special register that keeps track of where in the memory the stack is located. It is called the Stack Pointer, or SP. When you use PUSH, the value goes to the memory address in SP. Then SP is changed, so that it points at a new memory position in the stack. This works a little strange. You can think of the stack as a stack of plates. When you use PUSH, it's like you were putting a new plate on the stack. When you use POP, you remove it, revealing the plate underneath. So you must keep order of the values you put on the stack. Consider this example: PUSH AX PUSH BX POP AX POP BX First, AX is copied to the stack, and then BX. When the first POP instruction is called, the value that was last put on the stack, the BX value, is returned to AX. When the second POP is called, the first value of AX gets into BX. So this example will actually swap the values in AX and BX, using the stack. If you want the values to get back in the right order, you must POP them in the opposite order: PUSH AX PUSH BX POP BX POP AX This would correctly return the values to their original registers. This is a technique called LIFO, and it stands for Last In, First Out. Get it? Of course, there's no reason to PUSH and POP values like in the example above, but if you needed AX and BX for other things between the PUSH and POP calls, you would find it very useful. There's a strange thing about the stack that you need to know also. The image of a stack of plates is not entirely correct. The value in the SP register is not increased after each PUSH, it's actually decreased! So it would be more like a stack of plates turned upside down, even though earthly physics obviously woldn't accept stacks of plates hanging upside down on the roof. :-) This is not something you need to care about now though. Now let's return to the test program. As you can see, the PUSH instruction uses a register called BP. This is another new thing for you. BP is short for Base Pointer, and it is a value that points to the base of the stack. So in the plate example, it would point at the plate in the bottom of the stack, (if you ignore the upside down-thing for a while). It's essential that this value stays the same before and after the call to the asm routine, because QB uses it too. So therefore we always PUSH it in the beginning of the routine, and then POP:s it back in the end. Now we have come to the second line. By now you should understand what it does: It puts the value of the SP register in BP. Now the computer will think that the bottom of the stack is at what's actually the top of the stack. In the next part of this tutorial series I will explain why we do this. After the first two lines, we would be free to write anything, but as we don't want the routine to do anything yet, we'll just POP the old value of BP back and return to QB. The last line with an instruction called RETF, which stands for Return Far, will make sure we get back to the BASIC program. Let's try this out: - First, type the four lines into a text file and save it. Let's call it ASMTEST.TXT! - Then: Run Absolute Assembly. - First you will be asked to type in the name of the asm sourcefile. Type ASMTEST.TXT. Usually, assembly source files have the extension .ASM, but it doesn't matter what name you use. - Then you type in the name of the BASIC sourcefile. Since we don't have one, just type ASMTEST.BAS, and this file will be created. - Then you must type in the name of the code string. call it test. - The program now asks if you want to append the asm code to the BASIC file. Since the BAS file is empty, press the N key for no. - Finally the program asks if you want to add CALL ABSOLUTE lines. Press Y. - Now DEBUG should be executed by the program, and it will prompt you what is happening. If everything was done correctly, the program will ask you if you want to convert another file. Press N and exit the program. - Now open the file ASMTEST.BAS in QB. You will se this: ' ------ Created with Absolute Assembly 2.1 by Petter Holmberg, -97. ------- ' test$ = "" test$ = test$ + CHR$(&H55) ' PUSH BP test$ = test$ + CHR$(&H89) + CHR$(&HE5) ' MOV BP,SP test$ = test$ + CHR$(&H5D) ' POP BP test$ = test$ + CHR$(&HCB) ' RETF offset% = SADD(test$) DEF SEG = VARSEG(test$) CALL ABSOLUTE(offset%) DEF SEG ' ------ Created with Absolute Assembly 2.1 by Petter Holmberg, -97. ------- ' As you see, there's now a string variable called test$. For each line, numbers (converted to ASCII codes) are added to the string. On the right, you can see the assembly instructions you typed in earlier as comments. For each line, one assembly instruction, or more correctly, one machine language equvivalent to an assembly instruction, is added to the string. Because we answered yes to the question if we wanted to add CALL ABSOLUTE lines to the BAS file, there are also four other lines under the test$ declaration. the offset% variable gets the offset address of the string test$, and the DEF SEG instruction makes sure the default segment is the segment of the test$ string. (DEF SEG in QB is almost the same as typing MOV DS, BX in assembler) And now comes the CALL ABSOLUTE call. This line will execute the code located at the start of the test$ string. As this test program doesn't really do anything, you won't see anything happening. Finally, the second DEF SEG resets the default QB segment. Run the program and make sure it actually works! It won't do anything, but just the fact that it didn't crash is enough to make an assembly programmer happy! This is the end of the second part of my assembly tutorial. I've tried to go slowly in the beginning so that you would understand everything, but now the vital basics of assembler should be crystal clear for you. The next time we can start the fun! I will teach you more assembler instructions, and we will start writing programs that can do something useful, such as manipulating BASIC variables and returning the answer. As the last time, make sure you understand everything I explained in this part, and I'll see you in January! Have a Merry Christmas and a Happy new Year! Petter Holmberg