The QBNews Page 38 Volume 3, Number 1 March 29, 1992 ---------------------------------------------------------------------- A l g o r i t h m s ---------------------------------------------------------------------- QuickInsert Sort Algorithm - a push from B$ASSN by Larry Stone One of the fastest, all purpose sort algorithms, is QuickSort. QuickSort is one of the sort routines supplied with the Microsoft program, "DEMOSORT.BAS". Speed is achieved by picking a random pivot point then moving every element bigger to one side of the random point and every value smaller to the other side. QuickSort then recursively calls itself from within its two areas of code that deal with those elements greater or smaller than the pivot point. I like QuickSort but, alas, there are a couple of limitations that prevent me from incorporating into my code. The first is the fact that the pivot point is randomly selected. This means that the floating point library is loaded into the program. This is no small matter. The floating point library can add upwards of 10,000 bytes to one's final EXE size and, if the program is designed to avoid using floating point routines, this then becomes a prohibitive cost. The floating point library can be avoided by substituting a "homemade" random routine - one that uses LONG integers in lieu of real numbers. An outstanding replacement one could use is found in the book, "Microsoft QuickBASIC Programmer's Toolbox", by John Clark Craig. Craig's random generator is a QB version of two routines described in, "The Art of Computer Programming", Vol. 2, "Seminumerical Algorithms", by Donald Knuth. The random generator is an excellent replacement for QB's because it not only doesn't use floating point but also, it has a sequence length greater than 2^55 with a possible 256^83 possible sequences. That's an awful lot of random numbers! The real disadvantage to the QuickSort is the fact that it uses recursion. If there is a lot of data to sort, this recursive routine will blow the top of the stack. This can be overcome by setting a larger stack. Placing the statement, "CLEAR,,16384" at the top of the code will establish 16K of stack space. Conversely, "STACK 16384&" will do the same with PDS and the linker option, "/ST:16384", will do it if one compiles using PDQ. However, there is no free lunch. Increasing the stack size is accomplished by a comparable reduction in code and string space in DGROUP. Two other, quick sort routines may offer relief. The WAVE sort, developed by Richard F. Ashwell for GW-BASIC is somewhat slower than the QuickSort but on large sorts, it really shines. Another extremely fast sort that on average, is just a smidgen slower than the QuickSort, is the ripple sort algorithm that MicroHelp supplies with some of their tools. However, it seems logical that the absolute fastest sort should be an insertion sort - ie, sorting the data while it is gathered, intuitively, is faster than other techniques that requires two separate steps, a gathering followed by sorting. Unfortunately, BASIC insertion sorts are notoriously slow. They are The QBNews Page 39 Volume 3, Number 1 March 29, 1992 slow for two, related reasons. They require loops that must touch and compare array elements, one at a time, until the insertion point is located. This is generally accomplished by STEPping a FOR...NEXT loop in a negative direction, pushing each element "up the ladder" until the appropriate element is freed for insertion with new data. In three words, slow, slow, slow. "I-INSERT.BAS" overcomes the, up-to-now, inherent slowness of BASIC insertion sorts by incorporating two separate processes into its algorithm. The first part of the algorithm uses a modified binary search to quickly locate the appropriate insertion point. The binary search algorithm is not the generic binary search. Although, like any binary search, it looks for a match between an array elements data and the data to be inserted, it also looks for the appropriate spot where the existing array data has greater value on the up-side and lesser value on the down-side. The test data used by I-INSERT contains 150 random integers. On average, the binary search requires 5 steps into the array to locate the appropriate insertion point for each value inserted. The maximum number of steps required (for this data group) is seven and the minimum is one. The binary search is fast, not only because it minimizes the number of steps needed to locate the insertion point but also speeds along due to the nature of its architecture. If you look at the code, you will note that mathematics are limited to a single expression that adds the begin value to the ending value. Although the result of this is divided by two, the division is integer division which QB will compile as a shift instruction - "middle% = (begin% + ending%) \ 2". The math like instructions, "begin% = middle% + 1" and "ending% = middle% - 1" are compiled as INC and DEC instructions to the CPU - again, very fast. The search performs five tests. The first test is the conditional entry into the loop, "DO WHILE begin% <= ending%". Two of the tests are simple comparisons. One test isn't even a test but, rather, a fallthrough if the previous three tests within the loop are false. The slowest portion of the search is the first test. It tests two conditions and if the condition is met, then tests whether the insertion point is located. The second algorithm that comprises I-INSERT is the code that moves the existing data upwards in order to free the appropriate element for data insertion. It too, contains a single mathematic expression composed of a compiled SUB and IMUL instruction: "MoveBytes% = (LastElement% - middle%) * LEN(Arry%(Low%))" The actual moving of the array's elements is performed by a call to the routine, "QBMemCopy". This routine is an ALIAS for the QB procedure, "B$ASSN". B$ASSN copies a specified number of bytes from one memory location to another memory location in one, very quick move. B$ASSN requires six parameters: FromSegment, FromOffset, BytesToCopy, ToSegment, ToOffset, and BytesToTransfer. All are integers and all must be passed by value. Well, to speed things up just a wee bit more, the ALIAS for B$ASSN, "QBMemCopy", was declared using only four arguments: FromAddress, BytesFrom, ToAddress, and The QBNews Page 40 Volume 3, Number 1 March 29, 1992 BytesTo. Reduction in the number of arguments was accomplished by using the "SEG" directive which will push the segment and offset onto the stack. Although the stack still has the same number of arguments, execution time is again improved because the code does not require a VARSEG and VARPTR instruction (QB is already aware of the variable's address so a simple SEG directive suffices). BytesFrom and BytesTo are still passed by value. I-INSERT does have some limitations. The first is that B$ASSN can only move 64K bytes at a time. And, as presently written, it can only move 32K bytes. If you need to move more than 32K but less than 64K then make the following modification to the code that establishes MoveBytes: M& = 1& * (LastElement% - middle%) * LEN(Arry%(Low%)) MoveBytes% = M& - 65536 * (M& > 32767) If you need to move more than 64K bytes then you must do it in "chunks". Because B$ASSN performs its copy in a forward direction, you must DIM another array of equal size to the array you will be inserting into (or equal to your largest "chunk"). QBMemCopy, the ALIAS for B$ASSN, first copies a block of memory, bounded by the insertion point to the last existing array element, to the scratch buffer or working array. From there, it copies back to the original array but places the data up one element from their original positions. Once this is accomplished, the new value is then inserted. A speed improvement and memory reduction can be achieved using the MicroHelp routine, MhMove in lieu of B$ASSN. MhMove is smart enough to realize that data is being moved within the same segment. When MhMove "sees" this, it reverses the direction flag thereby, allowing for the copy to be performed only once. As a consequence, MhMove does not require the work array as a temporary scratch buffer which means that the memory allocated for B$ASSN to make its first copy to does not have to be allocated - hence, both a speed and memory improvement. Using I-INSERT's QuickInsert routine is as easy as eating apple pie. As your code reads in data, pass QuickInsert the value, the next array element that is available (LastElement%), and the array in which to insert the data. The work array or scratch buffer is DIM SHARED as TempSortArray%() and is not passed in the arguments list (If you use MhMove or a similar ASM routine then TempSortArray does not even need to be created.) That's all there is to it. It needs to be noted that the QuickInsert routine will not execute from within the QB or QBX environment. This is because neither environments will recognize the ALIAS for B$ASSN. You must compile it to run it. Also, if you desire to test its speed against other sort procedures, to be fair, these procedures need to be timed from the point in which data is first copied into an array because QuickInsert is sorting while the array gets filled. By-the-way, B$ASSN, among other things, is used by QB to copy data The QBNews Page 41 Volume 3, Number 1 March 29, 1992 from TYPE to TYPE and fixed-length strings to variable length strings. As a consequence, a lot of QB internal string handling code will be included with your program if you use this routine. This won't be very noticeable if your program already uses string handling routines. But if it doesn't use strings then you should be aware that your EXE size will substantially increase by declaring and using B$ASSN. Two final notes. I was originally going to name the routine the "Stone Insert Sort" but false modesty prevented that. The second is a challenge. I challenge you, the reader, to discover other QB internals that can be applied in ways to perform insert sorts on strings and string arrays. SOURCE CODE FOR THIS ARTICLE CAN BE FOUND IN I-INSERT.ZIP ====================================================================== Larry Stone is the author of the shareware file manager "SERVICES" and is also a frequent contributor to The QBNews. He can be contacted in care of this newsletter. ======================================================================