QB ON ACID ISSUE #4 December 21st, 1999 _________________________________________________________________ Infix -> ASM convertor plan, by logiclrd For every sum-of-terms, reorder it so that the longest terms come first, e.g.: 1 + 2*3 -> 2*3 + 1 1 + (4/2 - 6*8/3) - 3*4 -> (-6*8/3 + 4/2) - 3*4 + 1 Then, switch the order of dividends with their divisors (the reason for this will become clear later), e.g.: 4/3 -> 3/4 (6+3)/(2*x) -> (2*x)/(6+3) Look for terms in which the non-variable factors (ie, the coefficients) can be simplified, e.g.: 2*x/4 -> x/2 3*6 -> 18 Now convert the string to an expression tree by recursively selecting the last operation (by order of operations) of the current substring to be the new subtree's head. e.g.: 4*2+6 -> + -> + / \ / \ 4*2 6 * 6 / \ 4 2 Swap the left and right children of any node whose left child is a numeric constant and whose right child is not, e.g.: + -> + / \ / \ 6 * * 6 / \ / \ 3 x x 3 Swap the left and right children of any node whose left child is a greater power of two than its right child, e.g.: + -> + / \ / \ 8 * * 8 / \ / \ 4 x x 4 Swap the left and right children of any division node whose _LEFT_ child is a power of two (since we changed 4/3 to 3/4), and take the logarithm base 2 of the new right side, changing the node into a "shift right" operation: ( x/8 -> 8/x -> ) '/' -> shift right / \ / \ 8 x x 8 Now, parse the tree in postfix order to generate an RPN expression, e.g.: + -> (x, 4, *, 8, +) / \ * 8 / \ x 4 We're halfway there! Replace any constant followed by a + or - operator with a special code "add ", e.g.: (x, 4, *, 8, +) -> (x, 4, *, add 8) Replace any constant followed by a shift or bitwise operator into a single list entry in a similar manner, e.g.: (x, 3, &, 2, <<) -> (x, and 3, shift left 2) Replace any power of two followed by a * operator with a special code "shift left" e.g.: (x, 4, *) -> (x, shift left 2) DO Replace any "shift left" operation followed by a "shift right" operation with the corresponding single shift, e.g.: (x, shift left 3, shift right 1, shift left 4) -> (x, shift left 2, shift left 4) Replace any "shift left/right" operation followed by another similar "shift left/right" operation with the corresponding single shift, e.g.: (x, shift left 2, shift left 3, shift right 1, shift right 2) -> (x, shift lef t 5, shift right 3) LOOP WHILE (changes made) Here's the part that'll most likely require the most code: parsing the RPN string into ASM, keeping track of what registers are which stack entries. Treat the registers as though they were the RPN stack, using MOV EAX,[x] instead of PUSH [x]. This way, all PUSHes and POPs are virtual (unless you run out of registers, in which case you are forced to PUSH one of your registers anyway -- this is unlikely, though, unless you're dealing with a maniac of a coder!), and must be kept track of within your conversion routine. If you have an item which is implicated in a subtraction or addition and there is a multiplication or a division in between the term and it's operator (i.e., the '*' between '2' and '+' in (4, 2, 3, *, +)), then avoid using EAX or EDX for that term. If there is an item which has only shifts, special additions and special subtractions (not '+' or '-', just 'add 3'-type items) between it and the next multiplication or division, place that term into EAX. Keep in mind that modulus values can only be placed into EDX, while the actual division to be performed implicates EAX. These 'in-between' checks can be performed by going through the RPN list, keeping track of the stack height at every point. Also, watch out for changes in the order of stack variables. These are pretty much guaranteed to be the same for every operation of a given type. I've documented them below, to give examples. For divisions, divide EAX by the second-to-topmost item on the stack (this is why we changed 4/3 to 3/4: 3/4 becomes 3,4,/, 4 goes into EAX, and we can do DIV EBX). E.g.: (x, y, z, add 4, *, +) would become: MOV EBX, [x] ; because there is a '*' between 'x' and its '+' MOV ECX, [y] ; 'z' is between 'y' and '*', and ECX is next available register MOV EAX, [z] ; only 'add 4' between 'z' and '*' ADD EAX, 4 ; here's the special 'add 4' routine MUL ECX ; before operation, stack is: EBX->ECX->EAX, after: EBX->EAX ADD EAX, EBX ; before operation, stack is: EBX->EAX, after: EAX And finally, if the resulting value is not in EAX, move it there (unless you have intelligent condition parsing code). If your last operation is a commutative operation which uses EAX as an operand but not as the target, switch the operands around, e.g.: ADD EBX,EAX MOV EAX,EBX would become: ADD EAX,EBX thereby avoiding copying the value back into EAX. You're done! (but don't forget to keep track of which registers were destroyed, and don't forget to PUSH all the registers with values you need when doing function calls) By this method, this expression: (x / 8) & 63 + 64 * ((y / 8) & 63 would become: MOV EAX,[y] SHR EAX,3 AND EAX,63 SHL EAX,6 MOV EBX,[x] SHR EBX,3 AND EBX,63 ADD EAX,EBX which, I'm sure you'll agree, is pretty damn good output. By logiclrd