QFASTMAN
This article was originally written by Jim Phillips. If you would like to download this tutorial, as well as all six QFastMan example programs, click here.
QFASTMAN.txt
A tutorial-by-example for writing faster code
in QBasic
WHAT IS QFASTMAN?
This is a tutorial which illustrates, by example, how to increase the
execution speed of a QBasic program. The example, QFASTMAN, is a
Mandelbrot Set imager. It's not productivity software, but it is a severe
test of a program's speed. And whether or not you are familiar with
fractals like the Mandelbrot Set, I think you'll find it's instructive and
a lot of fun.
To get QFASTMAN to run at a worthwhile rate, we'll do a lot of creative
hi-jinks. We will NOT rely on assembly language, but as you will see, the
biggest speed gains are not in the coding language, they're in the concept
- the creative side of programming. Furthermore, by staying with QBasic,
things remain easier to understand or to modify for your own uses. And now
that I'm on that subject, I hereby release all of the code in the QFASTMAN
files to the public domain. If you can use any of this in your programs,
more power to you. All I ask is that you give me credit where it is due.
I wrote much of the QFASTMAN project in retrospect. I tried hundreds of
ideas, and wrote more than 75 program versions. Then I went back and
distilled it all into a few logical steps, while leaving out the chaff and
dead ends. You'll note that the programs are not modular. I don't
generally code this way, but these programs are small enough that I felt it
worked better to leave them as they are; it's easier (at least for me) to
picture the flow of the calculations and spot opportunities for
optimization.
I wish to thank the writers of the FRACTINT fractal program for the
valuable hints in their documentation, and Michael Abrash for his
first-rate book The Zen of Code Optimization. Without these sources,
QFASTMAN would not exist.
WHY QBASIC?
First of all, because it's free to almost everyone who has MS-DOS or
Windows on their PC. They can dig right into programming without buying
additional software.
Second, BASIC is great for creative "hacking"-style work. I do some
programming in C but I prefer BASIC (the QBasic interpreter and the
PowerBASIC compiler) for projects like this one because the syntax is far
less error-prone. When I'm spending less time zapping bugs I can try a lot
more ideas in one turn of the hour hand.
Third, if history is any guide (and in this instance it is), then the BASIC
code in these files will still be useable ten years from now. A number of
commercial and shareware compilers and interpreters supported in 2008 will
run this code. I wouldn't feel comfortable making that claim about any
other language.
Fourth, if you are going to translate code into your other favorite
language, BASIC is the easiest language to translate from.
And last but not least, QBasic is a cult thing, right? 8^)
WHAT IS THE MANDELBROT SET?
A deep discussion of the Mandelbrot Set and fractals in general is not the
purpose of this tutorial. I expect that if you have read this far, you may
already be familiar with the subject. But briefly, the Mandelbrot Set is
an area on a two-dimensional x-y number plane. (You probably drew graphs
on an x-y plane in high school.) The difference is that here we work with
the "complex" plane: we multiply the values on the y axis by the imaginary
number "i" which is defined as the square root of minus one.
To calculate the Mandelbrot Set, we take each x,y coordinate on this number
plane and perform a certain repetitive operation on it (the operation is
fairly simple and is in the code listings). If the successive results of
this operation get larger and eventually expand toward infinity, the x,y
coordinate is outside of the Mandelbrot Set. If the result of the
operation instead enters into a repetitive loop (in other words, it never
expands toward infinity) then the x,y coordinate is inside the Set.
For us, the casual explorers, the fascination in the Mandelbrot Set lies
near the boundary between those coordinates which are inside the Set and
those which are not. This boundary area is infinitely complex; it has been
called the most complex structure in mathematics.
If we "zoom" in to look at a small part of this boundary, we find that
there is additional detail which was not visible when we looked at the
entire Set as a whole. It's like looking at a coastline from orbit. If we
zoom in to look from a few hundred feet up, all kinds of rocks and pools
and new details appear. If we kneel right down on the shoreline, we see
the water curling around individual pebbles and grains of sand. Exploring
the Mandelbrot Set is like this, with one remarkable difference: you can
zoom into the Set boundary as far as you want, yet never reach the end.
Your only limits are the magnifying ability of your program, and its speed.
If you dig into the QFASTMAN programs the idea will become a lot clearer.
Seeing how the formula is laid out and then running the programs will
explain more than I could in many pages of text. (Also see "For Further
Study" below.)
QFASTM1 - THE (Q)BASIC APPROACH
QFASTM1 is simply the "mathematically correct" method for calculating and
displaying the Mandelbrot Set. I made no attempt to code an approach which
is intentionally slow or fast; I coded in a straightforward way, as you
might if you were copying the formulas from a textbook. QFASTM1 runs at
about the same speed as several independently-written MSet programs I have
seen.
The program will display on your screen a strange, warty and bulbous black
shape surrounded by white. The screen has become the complex x-y number
plane, and the black shape contains those coordinates which are inside of
the Mandelbrot Set. You will see some dark pixels apparently sitting out
on their own, separate from the main feature. These are in the Set, and
are connected to it. It's just that at this low magnification the
connecting filaments are below our limit of resolution. Later on we'll
zoom into an area and you'll see how finer details become apparent.
If you run QFASTM1 on an older, slower machine ("slower" meaning anything
less than a Pentium) you will notice something else. QFASTM1 is s-l-o-w.
On my Pentium 133 MHz PC, running in DOS, it takes 38 minutes 56 seconds to
finish. (All times in this text are for running in DOS. Windows times are
slower and less consistent.) On a 386sx20, say, it would take enough time
to see your lawn grow. I'm not kidding. If we zoom in to look at details,
the calculations get heavier; even the Pentium will bog down.
The "correct" way of doing something isn't necessarily the best way. In
the lower part of the screen the program prints the reminder "Ctrl+Break to
exit" in case you don't want to wait for the finish. If you want to see
the whole thing come out on that 386sx, fire it up and leave it for a
weekend....
QFASTM2 - SPEEDING IN THE SLOW LANE
In QFASTM1 we got an inkling of how slow this program can be without even
trying to make it slow. Often you'll have a program which is not so
terribly slow, but still needs a 50%-100% speed-up. Let's see if we can
get this kind of speed gain with just a few ordinary changes.
Nearly all of QFASTM1's work is done in the innermost loop, the one
beginning "FOR Iter% = 1 TO MaxIteration%". Obviously, if we reduce
MaxIteration we'll get a proportionate speed boost, but I'd like to avoid
this. A greater number of iterations on each coordinate yields a more
accurate - and more intricate - boundary for us to explore. Also, I wish
to eventually use the VGA's 256 colors. They will be tied to the number of
iterations performed on each coordinate before it "kicks out". Large color
palettes help give the Mandelbrot images their fascinating beauty.
So let's look elsewhere. How about that next line in the inner FOR loop:
it tests the square root of a sum against the value 2. We don't need to do
that slow function call or that math either. Let's just test the sum
against the value 4; the result is the same, and faster. Also, we
calculate Real Part squared and Imaginary Part squared at 2 different
places within the loop; let's do it just once and save the values in
temporary variables. Our third trick involves the ^2 format used to find
the squares. This is actually another function call; an ordinary multiply
is faster. Finally, remove the remarks from the most heavily used code.
QBasic is an interpreter, not a compiler. On every pass through the loop,
it spends a snippet of time checking each remark line before it moves on.
Those snippets add up.
How did we do? With these 4 simple changes, QFASTM2 is faster than QFASTM1
by 10.4 times on the Pentium133. I'd wager you can use that kind of quick
and easy speed gain in your programs.
It's still pretty slow, however, at 3 minutes 45 seconds. We're wrestling
with that devil floating-point math. Even on a PC with a coprocessor,
integer math is faster than the floating-point variety; on an sx PC they
are worlds apart.
What about the rest of the program? Isn't there code all over that could
be streamlined? Yes, but as Michael Abrash would say, "Know where your
clock cycles are going". Remove the inner FOR loop in QFASTM2 and you'll
find that the rest of the program uses only 1% of the total time on the
Pentium133. We could streamline the rest of the code to the point where it
runs in no time at all, yet QFASTM2 would speed up by only 1%.
You see, we calculate MSet values for 64,000 pixels. Placing a counter
inside the inner FOR loop reveals that we make a grand total of 3,526,188
iterations, each iteration involving several floating-point calculations.
The overhead in setting up 64,000 pixels is tiny in comparison.
If you want the cpu to work faster, reduce its work load. But it sure
would be nice if we could use that speedy integer math....
QFASTM3 - TURNING APPLES INTO ORANGES
How about this: you want to multiply two decimal numbers; let's say 1.22
and 0.15. That's a floating point calculation: 1.22 x 0.15 = 0.183.
Now do it again, but first multiply each number by 100. Now we are
multiplying 122 and 15. That's an integer math calculation: 122 x 15 =
1830. Since we multiplied each factor by 100, we need to divide 1830 by
(100*100), and we get our answer of 0.183. The first and last steps are
floating-point calculations, but the intermediate steps are integer math.
If we are dealing with an operation where the intermediate numbers are
manipulated many times (as we are with QFASTM2's inner loop) the time saved
with integer math will be more than the time lost doing the conversions at
either end. The conversions become a part of the relatively small
overhead, and the loop which is eating up 99% of the clock gets converted
from floating-point-intensive to integer-intensive.
QFASTM3 is about as simple a conversion to this idea as I had time to come
up with. It uses long integers to avoid overflows. The long int's also
make it easier to deal with rounding errors, which are the big bugaboo of
this approach (more on that later). The larger the factor by which we
scale up things, the less bothersome are the rounding errors, but the
slower the program. And if we get too large with our scaling factor we'll
run into overflows again. Here we use a scale factor of 1000 and it works
pretty well.
How well? On the Pentium133 our new version runs in 1 minute flat. It's
faster than QFASTM2 by 3.7 times. You'll notice that the image is not
quite the same, due to rounding. But the floating-point versions have
rounding errors also; QFASTM3's depiction of the x-axis as an unbroken line
is actually more accurate.
The lesson: if you don't like the rules, change them....
QFASTM4 - GO JUMP IN THE LAKE
So far we have sped up the program by 38.9 times on the Pentium133. It
appears that we will end up with a fast program if our creativity holds out
(and it usually does if you give it a positive attitude and occasional
rest). So let's enhance QFASTMAN's appearance a bit, with 255 colors
courtesy of the VGA mode 13. (Mode 13 has 256 colors, but we'll save one
for later use in QFASTM5.)
The colorful beauty of Mandelbrot images comes mainly from tying the
palette colors to the value of the iteration at which the MSet formula
"kicks out". Doing it is pretty simple: we take the iteration value and
write it into the coordinate's location in video memory. The default
palette is not particularly attractive, but QFASTM4 will give you the idea,
and we'll see in QFASTM6 that changing the palette is also simple to do.
Okay, back to the hot-rodding. Our images show the complex x-y plane with
the Mandelbrot Set as a central dark area. Literature about the MSet often
refers to the central area as the Mandelbrot "lake". Coordinates in the
lake are the ones which never cause the repeated iteration to grow to
infinity, no matter how many times we iterate. In other words, whenever we
test a coordinate which is in the lake, the result of the iterated formula
settles into an infinite loop. These coordinates are also the slowest ones
to calculate, since each one guarantees that the inner loop will go through
the maximum number of iterations. Wouldn't it be nice if we could
short-circuit these most time-consuming coordinates of all?
It turns out we can. The fact that the iterated formula gets into an
infinite loop is the clue: it cannot be an infinite loop unless the results
repeat. As soon as there is a repeat, all succeeding iterations must
repeat. Luckily for us, the repetition begins soon enough that we can
usually detect it long before we get to even the 100th iteration.
Of the approaches I tried, the most effective seems to be to test the value
of (RealPartSquared + ImaginaryPartSquared). When this value begins to
periodically repeat, we have a coordinate that is in the lake; we set Iter%
to the maximum value and exit the loop.
Since there is some rounding in our method due to our quest for speed, it's
slightly possible for a coordinate outside of the M-Set to falsely test
positive for periodicity. Furthermore, it's counter-productive to test
every coordinate for periodicity, since the majority are not in the MSet
lake. Borrowing a suggestion from FRACTINT, our method is more foolproof
and much more efficient if we test for periodicity only in those cases
where the previous coordinate proved to be in the lake.
The result: testing of coordinates inside the MSet lake goes much faster.
QFASTM4 runs in 24.2 seconds under DOS on the Pentium133 - a 2.5 times
speed boost. In your quest for speed, look for clues in the data....
QFASTM5 - PAINTING THE PIXELS
A program can do a lot more useful calculating in a given time if it avoids
some of the drudge work. Take a look at QFASTM5.
Here we borrow another idea from FRACTINT. Almost any image from the MSet
will contain many areas of many pixels all the same color. QFASTM5
calculates a pixel, skips a few, calculates a pixel, skips a few, and so
on. Then it checks each of these small "boxes" to see if it has pixels of
the same color at all four corners. If it does, we assume the area inside
this box is also the same color and we simply "paint" it in without
bothering to calculate the blank pixels. When we've gone over the whole
screen, we go back and calculate those pixels which are not yet colored.
QFASTM5 uses boxes 4 pixels on a side, which seems to be a good compromise
between speed and accuracy. And it is fairly accurate; the image is pretty
much as it was before. Speed is good, too: the execution time on the
Pentium133 is now down to 11.3 seconds, a 2.1 times speed gain.
The fastest way to calculate something is to not calculate it at all....
QFASTM6 - TIME OUT FOR SOME COMPUTER ART
Here ends the main part of this tutorial, but I want to leave you with a
little taste for more. QFASTM6 displays a low-power (8X magnification)
zoom into the MSet, using a modified palette. This is just an inkling of
the explorations that are possible.
Something you might wish to code: 320 x 240 256-color Mode X. You'd get
square pixels and 20% better resolution for more attractive images. And
beyond that, many home PC's now have SVGA....
JUST HOW FAST CAN THIS SUCKER GO?
Actually, a good deal faster, at least in some cases. From QFASTM1 to
QFASTM5 we got a speed gain of 207 times (!). But now that you've learned
the concept behind creative optimizing, I'll give you a suggestion.
You may have noticed that the Mandelbrot Set image is symmetrical around
the x-axis. That is, the iteration value for the coordinate (x,y) is
identical to the value for the coordinate (x,-y). Once you calculate one,
you've got the other; color one coordinate and you can jump over and color
the other one. That's right: in every QFASTMAN from QFASTM1 through
QFASTM5 we did twice as much figuring as we needed to! If you code this
efficiently, any time the x-axis is inside the screen frame you'll get a
speed boost. If the x-axis is centered, as it is with the full MSet image,
you'll draw your image nearly twice as fast.
QFASTM5 is not fully streamlined in other ways, either. We have sped up
the program so much that the overhead in setting up the 64,000 pixels is
now a significant fraction of the total run time. You could find a modest
gain there. And maybe there is a still-undiscovered way of looking at the
coordinate calculations that will get rid of much of the math...?
MAKING ZOOMS ZOOM
QFASTM6 reveals the shortcoming with the integer approach in QBasic: we run
into overflow problems almost immediately. How can we get around this?
(1) Go back to floating point. The program will slow down a lot, but
we'll keep most of the gains we've made.
(2) Use fixed point arithmetic, in which one integer represents the
integer part of a value (i.e. the 2 in 2.345) and a second integer
represents the decimal part (i.e. the 345 in 2.345). I won't go into
this any further here; I'll leave it to you to work out an approach,
or consult FRACTINT, etc.
(3) Figure out a different approach using integer math. Any number, no
matter how many significant digits, can be represented by a series of
smaller numbers. For example, 123456 can be represented as
(123 x 1000) + 456. I think there is a lot of possibility in such
an extended-precision math approach; I frankly have not had time to
work on it. Remember: it will have to be efficient or the program
will take ages to run.
Note that going back to floating-point is only a band-aid solution. It
will only let you zoom in so far, then you'll run out of significant
digits. FRACTINT, using a compiled assembly-language extended-precision
library, can zoom into the MSet to magnifications equal to 1 followed by
1300 zeroes.(!) If you are interested in writing a comprehensive (Q)BASIC
MSet program, you would do well to download the free FRACTINT program and
its documentation. Running the program will give you lots of ideas, not to
mention a severe case of insomnia from deep-zoom exploring. And, as I said
before, you are free to use any of my QFASTMAN code in your program - just
give me due credit.
WHEN DO WE GO TO ASSEMBLY? or WHO SAID THAT NASTY WORD?
Simply put: never. That is not the purpose of this tutorial. Creative
optimizing will almost always pay bigger dividends than simply translating
a program into assembly language. Re-coding a program in well-written
assembly typically gains 2-3 times in speed. This utterly pales in
comparison to the gains possible from creative thinking. Remember, we have
sped up QFASTMAN 207 times over, while staying with QBasic, and a few
paragraphs ago I gave you a hint for nearly doubling the speed again.
Besides, when you optimize a program it tends to get larger and more
abstract. An optimized assembly listing is one of the most obscure
documents you'll ever see; optimized high-level code is bad enough. When
you have absolutely run out of ideas for improving your QBasic code and you
still need more performance, only then is it time to resort to low-level
code. And there are always BASIC compilers; the QFASTMAN files need very
few changes to compile in PowerBASIC, for example. As a very rough general
rule, figure on a 4 times speed gain for compiling, over interpreting.
A PENNY (WELL, NOT QUITE A PENNY) FOR YOUR THOUGHTS
You may think of some speed tricks that I missed, since I could only put so
much time into this project. If you come up with something, I'd appreciate
hearing from you. Actually, any questions, corrections, or suggestions are
welcome. I hope to some day get back to this project and work out a
high-speed zoom!
Jim Phillips
6795 Lipscy Road jphillip@northernnet.com
Hill City, Minn 55748
FOR FURTHER STUDY
This is not by any means an exhaustive list. These are simply the sources
from which I borrowed shamelessly to create QFASTMAN.
* Scientific American magazine, August 1985, "Computer Recreations". It's
hard to say how many readers felt their imaginations soaring off when this
issue arrived in the mail with one of the magazine's most popular articles
ever. This is the article that awoke me to the Mandelbrot Set, and even
inspired me to write a science fiction story based on a take-off of the
concept (never published but not half bad, really). The article contains
some nice images, with a lucid explanation and introduction to the
Mandelbrot Set.
* FRACTINT. A fractal explorer program superb, with more bells and
whistles than Santa's workshop. Clearly a labor of love by a whole gang of
coding ninjas, it was the inspiration for QFASTMAN and is the standard
against which I've tested QFASTMAN's speed. It's compiled, written in C
with quite a bit of optimized assembly language in the heavily used
routines, and is claimed to be the fastest MSet program around; it's
certainly the fastest I've seen. FRACTINT has been available for free on
the Internet whenever I have checked; source code and documentation are
included. Just search on FRACTINT and download.
* The Zen of Code Optimization, by Michael Abrash. The code is written in
C and assembly, but it's the mind-set or attitude that's important (you
know that now!), and Abrash gets this across as well as anyone. If you are
serious about optimizing your BASIC code - particularly for speed - you
should consider this book.
* PC Magazine's BASIC Techniques, by Ethan Winer. Out of print, which is a
pity, but available on PowerBASIC's PB/Xtra Version 2 CD. You might also
find it on the Internet, since Ethan has very generously donated it to the
public. Loads of good BASIC programming tips. See Chapter 9 specifically
for many techniques on code optimization, though there are others sprinkled
throughout. Get your hands on this text if you can.
* And last but not least: Remember that your creative mind is still the
most powerful optimizer to be had!
Good luck in all your coding endeavors, and keep the "QBasic Qult" alive.
- Jim Phillips, June 1998