By Petter Holmberg
Assembly language programming tutorial part 1: Getting started
By Petter Holmberg of Enhanced Creations
This article is written for all of you who wants to learn how to program in assembler in order to enhance your QuickBASIC programs. I know this is a dream for many QB programmers, but they feel it's too complicated to learn, and they haven't found any good sources of information to get started with. If you are one of these programmers, this article is written for you. You will find that it's not easy to learn assembly language programming, but you will probably also find that it's much easier than you first thought. This article will not delve too deeply into assembly language programming, but it will give you a solid start to work on.
So what is assembler then?
The early ancestors of today's computers, developed in the period of about 1940 to 1960, was a real pain to program. The circuits in these computers could perform simple arithmetic operations, they could take data as input, write data as output, and do other operations needed to solve problems for the people that had built them. In order to make the computers understand what they should do, they needed to be fed with instructions. These instructions was given to the computers as series of codes. Let's say the number 1 was the code for adding, the number 2 was the code for subtracting, and the number 3 was the code for outputting the result. The programmers would figure out a program, input it into the computer by turning switches or making holes in paper cards and feed them to the computer. If the program didn't work, the programmers had to go through each instruction again and see were the error was, and then reprogram the computer again. Not very convenient, especially as the programs were all written as a series of ones and zeroes. In order to make programming easier, they started writing the programs in hexadecimal numbers instead of binary numbers. That changed 4 binary digits into one hexadecimal, making the programs shorter and easier to read. But the programs was still just a sequence of numbers, hard to remember and understand for any programmer. So someone had the great idea that they would instead write the instructions as short words, that could be translated directly into numbers and fed to the computer. So instead of saying 1 for an addition, the programmers could say "add", and instead of 2 for subtraction, they could say "sub". Now you could see more clearly what the program did, and finding errors was not as hard anymore. The assembly language was invented.
Later on, computer engineers found out that you could actually make programming a lot easier if you rewrote long sequences of assembly instructions into codes much more like human language. They were called high-level programming languages, and BASIC was one of the first ones. Today's microprocessors still perform their dutys as a series of simple instructions, such as "add" and "sub", but programming languages like BASIC makes sure that we usually shouldn't have to worry about it.
Why do I need to learn assembler?
There are many reasons to use a high-level language like BASIC instead of assembler: A simple instruction such as PRINT could in assembler be more than 100 lines of code. It is therefore pretty obvious that BASIC programs are easier to write and debug, and you don't have to worry about what the processor actually does when it writes a letter on the screen. It just works. Another reason to use high-level languages is that you could easilly convert your BASIC program on yout PC to work on an Amiga computer, using an Amiga BASIC compiler. If you had wrote your program in assembler you would find that the Amiga wouldn't understand it, because it's CPU doesn't work like a PC processor. There are still reasons to use assembler instead of a high-level language: QuickBASIC cannot do everything. There are sometimes things you want to do with the computer that no BASIC instruction can do, and you often find that your BASIC program needs to do so many calculations that the program gets slow. The problem is that such an instruction as PRINT takes many possibilities into account. It makes sure you have a valid string to print, it checks what screen mode you use and what color you want to print the text in and so on. Usually you know all these details when you want to print the text, and you don't need the processor to perform all these checks. The only way to remove them is to use assembler code instead of PRINT. There's no point in writing a full program in assembler. Only use it when you need to do something really fast or something really low-level.
What do I need to know?
When you write a BASIC program, you don't really need to know much about how the computer works. In assembler you work with the computer on it's own level, and therefore you need to know what you're actually doing. You don't need to know very much to get started though, and you will learn the rest as you're learning assembler. The first thing that you will find useful to know is how to count in the binary and hexadecimal system instead of the decimal. This is pretty easy to learn. Usually we count in the decimal system. We then have 10 numbers, ranging from 0 to 9. The lowest number we could use is 0, and as we count upwards we use the numbers 1, 2, 3, 4, 5, 6, 7, 8 and 9. That's all the numbers we have, so in order to continue we need to use two numbers. We reset the 9 to 0, and add a 1 to the right of it. The first number is now worth 10 times the second one. We can now use all combinations of numbers up to 99, and then we need to reset them and add a third number. This suggests that the number 1234 can be expressed as 1*10^3 + 2*10^2 + 3*10^1 + 4*10^0. See the pattern? What if you didn't have 10 numbers to play with? Well, it works just as fine anyway. The binary system, on which computer technology is based, has only 2 possible numbers, 0 and 1. You start counting from 0, and when you reach 1 you have used all of your numbers and need to add a second one, and you get the number 10. Each new number is worth 2 times the number to the right. The binary number 10110 can thus be expressed as 1*2^4 + 0*2^3 + 1*2^2 + 1*2^1 + 0*2^0, or 22. The hexadecimal system works with 16 different numbers. Since we only have invented 10 symbols for numbers, we use letters to represent the higher numbers. The hexadecimal system therefore uses the numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F. The hexadecimal number F3 can therefore be expressed as 15*16^1 + 3*16^0, or 243. It's easier to understand if we put the three systems in a table for comparisation:
Decimal Hexadecimal Binary 0 0 0 1 1 1 2 2 10 3 3 11 4 4 100 5 5 101 6 6 110 7 7 111 8 8 1000 9 9 1001 10 A 1010 11 B 1011 12 C 1100 13 D 1101 14 E 1110 15 F 1111 16 10 10000
As you can see, the number F in hexadecimal is the same as the number 1111 in binary, and this shows why the hexadecimal system is often used in assembly language programming instead of the decimal. If you want the binary number 1111000011110000, you can write it in hecadecimal as F0F0. As you can see, it's easy to convert binary numbers to hexadecimal and hexadecimal numbers to binary.
The number of different digits you can use is called the base of the counting system. You can use any number as a base. If your number in any counting system is, say, 3 digits long, it can be expressed as: a*base^2 + b*base^1 + c*base^0, where a, b, and c are your three digits. The most important thing when using different systems simultaneously is to keep track of what system you use for a certain number. For example, is the number 10 the usual decimal for 10, or the binary version of the decimal number 2? If you still haven't understood this, read it again until you do or ask someone who understands it to explain it to you. It's very useful to know about this when you program in assembler.
The second thing that is necessary to know when programming in assembler is the PC memory architecture. I'm not going to explain this in detail, because it's a complicated issue.
A PC has 640 kilobytes of basic memory, and additional megabytes in special memory circuits that you can insert into the computer yourself. The terms EMS and XMS refers to this extra memory. That is not the memory I'm going to talk about here. The interesting thing is the basic 640 kilobytes that every PC has. You need to know how to find a certain position in the memory if you want to use it, and you need to know how to do this if you are going to be an assembly programmer.
Each position in the memory have an address, a number telling the computer where to read or write data. It would have been easy if this addres would just have been a number from 0 to 640k, but that's not the system used. A memory position is described by two numbers, called the segment address and the offset address. The actual memory position is a combination of the segment and the offset address.
The segment address describes the memory as groups of 16 bytes. The first byte in the memory, byte 0 if you like, has the segment address 0. The segment address 1 is the 16th byte in memory, and the segment address 2 is the 32th byte in memory. The offset address is a number telling you how far from the segment position in memory the byte you want is. So if you want to access byte 3 in memory, you use the segment address 0, and the offset address 3. Together they form a number pointing at an exact memory position. Written as a formula this can be expressed as: actual memory address = segment*16 + offset. if you want to access byte 20 in the memory, you use the segment 1, giving you the position 16, and the offset 4, adding 4 bytes to the position, for the final number 20. But you can also use a segment address of 0, and the offset 20, giving you the same memory position! The segment and the offset address numbers can both range from 0 to 65535, giving you several possible combinations when you want to use a certain memory position. This system makes it a little complicated to understand memory addressing to beginners. You can see what segment and offset a certain BASIC variable is located at by using the functions VARSEG and VARPTR. Try it!
Now you might be wondering how it is possible for both the segment and offset variables to be 65535. That gives you the biggest possible memory position of: 65535 * 16 + 65535 = 1114095, which is bigger than 640k. Well, this memory certainly exists, but it is not accessible as the first 640k of memory, and I'm not going to delve deeper into this here and now. Later on, I will discuss memory access in more detail.
Again, if you didn't understand this, read it again, and if that didn't help, ask someone to explain it to you.
This was all for the first part of this article: A very brief introduction to what's about to come. The next time I will start describing the basics of assembler and how you use it in QuickBASIC. Make sure you understand the different numbering systems and the memory addressing scheme until then.
Bye for now!
(Editor's Note: Petter's asm series will continue in Issue 5. Check it out!)
Back to Top