The QBNews Page 5 Volume 1, Number 2 February 2, 1990 The Speedy INSTR Function - Programming a Backwards INSTR Routine by Larry Stone One of the easiest traps to fall into is using the FOR...NEXT loop to search a string for a sub-string. I call it a trap because it seems the obvious choice of functions to use, when, often-time, it is neither the fastest, nor the best of available choices. The choice of the FOR...NEXT loop seems the obvious because when we use it, we do so on a variable who's length is known. So why not just step through it, character by character, and look for the sub- string? The drawback in using the FOR...NEXT loop is, primarily, speed. A typical use (as demonstrated by the correspondence in the Quik_BAS echo) is finding the starting point of a sub-string, i.e., looking for a file name embedded within a complete path and filename statement. Now, let's look what happens when you use a FOR...NEXT loop: fullName$ = "C:\UTILITY\DISK\PCTOOLS\PCTOOLS.EXE" length% = LEN(fullName$) FOR A% = 1 TO length% IF MID$(fullName$, A%, 1) = "\" THEN B% = A% NEXT In the above example, we are looking for a sub-string composed of only one backslash character. The string to search is 35 characters long. This means we must loop 35 times in order to find four occurrences of the sub-string. Pretty inefficient, isn't it? All we wanted was the last occurrence of the backslash character and we had to loop 35 times only to discover that the last occurrence was at position 24 (In the above example, B% will equal 24). The INSTR function handles the above solution in a much more elegant manner. The example below is a backwards INSTR routine. Although it does not, in actuality, step negatively through a string, it quickly reports the last occurrence of any sub-string. fullName$ = "C:\UTILITY\DISK\PCTOOLS\PCTOOLS.EXE" searchString$ = fullName$ subString$ = "\" P% = BackInstr%(0, searchString$, subString$) PRINT "The last position of the backslash is"; P% FUNCTION BackInstr% (start%, searchString$, subString$) IF start% = 0 THEN start% = 1 N% = INSTR(start%, searchString$, subString$) A% = N% DO WHILE N% N% = N% + 1 N% = INSTR(N%, searchString$, subString$) IF N% THEN A% = N% LOOP BackInstr% = A% END FUNCTION The QBNews Page 6 Volume 1, Number 2 February 2, 1990 The elegance of this function is that your DO...LOOP executes exactly four times - once for each occurrence of the backslash! Speed of execution is achieved because there is only four iterations of the loop instead of the 35 in the FOR...NEXT loop, and speed of execution is further improved because of the way the BASIC's INSTR function works. The INSTR function does not do a "brute-force" evaluation, comparing every character in both strings. Instead, it isolates the first character and then looks for the first occurrence of a possible match. Looking for a single character is much faster than checking every character. If a match is found, and if subString$ had more characters in it, then the next character is searched. If the next character is not matched, INSTR proceeds from the next position in the string where the first match was found, requiring that much less string to search. The actual asm instruction used is scasb. This technique is twice as fast as the Boyer-Moore algorithm, considered by many as a bench-mark. For a more exact description of the process, read an article called, "QuickBASIC's Fast String Searching Algorithm", in the Programmer's Journal, 7.6, authored by Ethan Winer of Crescent Software.(See Above) Would you like your programs, when they search for a sub-string, to search for exact matches as well as, embedded matches? Simply swap the locations of the search string and the sub-string. Here's how INSTR can work for you: searchString$ = "These" subString$ = "The" start% = 1 '---- Check one string against the other. N% = INSTR(start%, searchString$, subString$) IF N% THEN '---- If a match was found, swap locations and search again. N% = INSTR(subString$, searchString$) IF N% THEN '---- Now, if a match was found then it's an exact match. PRINT "Sub-String is Exact Match of Search String" ELSE '---- Otherwise, it's an embedded match. PRINT "Sub-String is Embedded in Search String" END IF ELSE '---- The sub-string was not located in the Search String. The QBNews Page 7 Volume 1, Number 2 February 2, 1990 PRINT "Sub-String is Not Found in Search String" END IF The example above will report that N% is equal to one because "The" is a subset of "These". To have this code determine whether "The" and "These" are exact matches would require the BackInstr function. Use it to find the end of the word that was matched (use a space, period, and hyphen as sub-strings). When you know where the end of the word is, simply state that: word$ = MID$(searchString$, N%, P%). Then, use the INSTR function to compare INSTR(subString$, word$). If you wish to move through the searchString$ for the next occurrence of a match, place all but the first three lines of code within a DO WHILE start%...LOOP. If N% is some positive value, then, at the bottom of the loop, LET start% = N% + 1. Otherwise, if N% is zero, then LET start% = N% to end the loop. When you are searching for more than one occurrence of a match, you should have an escape key assigned to get you out of the loop. A$ = INKEY$: IF A$ = CHR$(27) THEN EXIT DO works quite well. One word of caution. The INSTR function returns zero whenever a compiled .EXE uses the literals CHR$(1) or CHR$(2) in a string. This should not, however, cause any problems in QB's environment.