Heuristics for a Self-Organizing Sequential File

[ Frame-Free Link ]



Disclaimer

The opinions expressed by RudeJohn do not necessarily reflect the opinions of the Basix Fanzine or its staff. In fact, they probably disagree with me entirely. Everyone else does. <sigh>




Introduction

The basic sequential search is fine if we intend to use it infrequently on a relatively small number of records. If our list of records grows considerably larger, or we find ourselves searching much more frequently, then we can try to improve our program's performance with a little heuristic twiddling. Eventually, we may have to abandon our sequential-search algorithm in favor of a more efficient process.
In the meantime, let's see what we can do.




Move to the Front

The Move to the Front heuristic is implemented by moving the last record accessed to the front of the list. This implies that the most frequently accessed records will be closer to the front of the list than records which are accessed much less frequently.
When a record is moved to the front, all records which precede it must be pushed back one position. If we use a linked list, the number of steps necessary to accomplish this rearrangement remains almost constant regardless of the size of the list. If we use an array, the number of steps required to accomplish our rearrangement will be proportional to the number of records shifted.
This heuristic is particularly effective if we are accessing a small percentage of our records most of the time. In other words, the performance of this heuristic is optimal when the access-frequency distribution is heavily skewed in favor of a small number of records. If the access-frequency distribution for our records is relatively flat, then this twiddle becomes a big waste of time. Like the saying goes, "Timing is everything!"




Interchange

This heuristic involves interchanging the most recently accessed record with the record immediately preceding it, i.e., the most recently accessed record is promoted one position toward the head of our list.
Interchanging is a good choice when working with arrays because we only need to interchange two records. Unless, of course, the most recently accessed record is already at the head of the list.




Relative Positioning

Relative positioning requires that each record maintain a hit-count, which is used to sort the records. Resolving the relative positioning of records with equal hit-counts may be accomplished in any number of ways. Use your imagination.     }:oP




Delayed Updating

Delayed-updating is a strategy for deciding when to rearrange our list of records. For example, if each record maintains a hit-counter then we may decide to rearrange our list after any one record is accessed at least N times. As a record is accessed and its hit-counter is incremented, e.g., Record( Index ).HitCount > 0 , then Record( Index ).HitCount mod N = 0 iff Record( Index ) has been accessed a number of times equal to an integral multiple of N, i.e., 1×N, 2×N, 3×N, etc.
Another alternative would be to maintain an access-counter for our list, and rearrange the list following every Nth access. Again, we could test with something similar to ListAccessCount mod N = 0 to determine when the list has been accessed a number of times equal to an integral multiple of N.
There are as many different methods for implementing delayed-updating as your sick, twisted imagination can produce. If you come up with something good, drop me a line.




Pitfalls

Things have a way of balancing out. Sequential files (self-organizing or otherwise) are useful when you are certain that every access will refer to an existing record. On the other hand, a sequential search for a record that doesn't exist means that your program will access every record in the file before it finally reaches the conclusion that it can't find what it's looking for. Under the right circumstances, you might be able to validate access requests before they are processed. Then again, it might be impractical to attempt any more than a partial validation. If the list of records is short, then the cost of processing the occasional bogus request may present an acceptable loss. So, when is a sequential list just too big to process requests without validating them? You decide; I'm gonna sack out.

C'ya,
RudeJohn
"I'm rude. It's a job."