Why 64K?

[ Frame-Free Link ]



Disclaimer

The opinions expressed by RudeJohn do not necessarily reflect those of the Basix Fanzine or its staff. In fact, they probably disagree with me entirely. Everyone else does. <sigh>




Preface

My knowledge of assembly languages lies somewhere between "Huh?" and "DOH!" So, if I make any mistakes in that regard, please feel free to register your complaint with whoever happens to be standing around. I'm sure they'll care.
Most of the information in this article was gleaned from the documentation that accompanies MSVC++ 1.5 Professional Edition (circa 1995) for Windoze 3.x. If I'm not mistaken, this was the last version of VC capable of compiling 16-bit DOS applications written in C. According to the literature, the run-time library was "designed primarily for compatibility with the ANSI C standard." <smirk> Who says those guys at Microsoftie don't have a sense of humour!
Although QuickBasic does not address the subject of pointer variables directly (pardon the pun), pointers will invariably pop up in any long-winded, rambling discussion regarding the efficient use of memory. If you don't believe me, just keep reading. I will not shy away from talking about pointers in this article because QuickBasic hobbyists should at least be aware of their existence. While pointers may not have been made available to the programmer, QuickBasic does use them internally. I think. <grin>




Pointing is Rude

What makes a pointer variable different from any other type of variable is that it contains the address of a location in memory. In other words, it "points" to that location. A pointer variable that fully specifies a memory address needs 32 bits: 16 bits for the base address (i.e., the segment location) and another 16 bits for the offset within the segment.
An application is typically composed of at least two parts: the code and the data. As an application executes, it refers to elements of the code or data by their addresses. These addresses can be stored in pointer variables which fit into either 16 or 32 bits, depending on the "distance" of the object to which they refer.
The Intel 80x86 processors have a segmented architecture, with means they all have a mode that treats memory as a series of segents, each of which can occupy up to 64K of memory. Accessing information within a segment is accomplished by using an offset from the segment's base address. Additional machine code is required to access more than one segment at a time.
The limit of 64K for the size of a segment follows naturally from the processor's architecture because registers in the 80x86 family (at least up to the 80486) are 16 bits (2 bytes) wide. The number of all possible values for a binary number which is 16 bits long is equal to 2×2×2×2×2×2×2×2×2×2×2×2×2×2×2×2 = 2 16 = 65,536 = 64 × 1024 = 64K. In other words, a 16-bit pointer variable can point to 64K different memory locations. One at a time, of course.
The 80x86 register CS holds the base (i.e., the location) for the code segment while the register DS holds the base for the data segment. There are other segment registers available, but for now all we need to realize is that CS and DS point to a program's code and data segments, respectively.
If an application requires only one program segment and one data segment, the addresses held in CS and DS remain constant anytime the program is run. This means that both you and your compiler are able to use 16-bit pointers rather than 32-bit pointers in both the code and data segments. The comparison is not unlike looking at QuickBasic's INTEGER versus LONG data types. A LONG variable requires four bytes of memory while an INTEGER needs only two. You may also have noticed that the same program runs faster if LONGs are replaced by INTEGERs. Obviously, 16 bits of memory is less than 32 bits of memory, and applications may run noticably faster when 16-bit pointers replace their larger cousins.
16-bit pointers to objects within a single 64K segment are called near pointers. Remember, it takes 32 bits to fully specify a memory address, so near pointers only hold the offset of an object within its segment. Accessing a near object is called near addressing.
If your program needs more than 64K for code or data, then at least some of the pointers must specify the memory segment, which requires 32 bits. These pointers, which can specify any location in memory, are called far pointers. Surprisingly, accessing a far object is called far addressing. Far pointers may be used to access up to 640K of memory under MS-DOS.
A third type of pointer (at least under VC++) is the huge pointer, which applies only to data pointers. Code pointers cannot be declared as huge. A huge address is similar to a far address in that it consists of two 16 bit values: one for the segment and another for the offset. Far and huge addresses differ only in the way that pointer arithmetic is performed.
For far pointers, it is assumed that code and data objects lie completely within the segment in which they start, so pointer arithmetic operates only on the offset portion of the address. Limiting the size of any single item to 64K makes pointer arithmetic faster. Huge pointers overcome this size limitation; pointer arithmetic is performed on all 32 bits of the data item's address, thus allowing data items referenced by huge pointers to span more than one segment. The huge pointer is incremented as a 32-bit value that represents the combined segment and offset. Extending the size of pointer arithmetic from 16 to 32 bits causes such arithmetic to execute more slowly. You gain the use of larger arrays but pay the price in processing time.




Standard Memory Models

The following table lists the maximum memory allowed for code, data, and data arrays under each of the six standard Microsoftie memory models.

Standard Microsoftie Memory Models
Model Code Data Arrays
Tiny less than 64K less than 64K less than 64K
Small 64K 64K 64K
Medium no limit 64K 64K
Compact 64K no limit 64K
Large no limit no limit 64K
Huge no limit no limit no limit


No limit? Well, not exactly; QuickBasic programs suffer from the 640K limit of conventional memory under MS-DOS.
The tiny memory model provides a single segment -- which cannot exceed 64K -- for both code and data combined. The tiny memory model is the only model that permits the creation of command files, which use the COM extension. Command files are formatted differently than other executables, and may be used in ways that standard EXEs cannot.
The small memory model provides one code segment and one data segment., each limited to 64K. The total size of a small-model program cannot exceed 128K. Both code and data items in a small-model program are accessed with near addressing, which makes a small-model program faster than one which uses far addressing.
The medium memory model provides a single segment for data and multiple segments for code, each limited to 64K. Each source module is given its own code segment. By default, code items in medium-model programs are accessed with far addresses and data items are accessed with near addresses. The medium model provides a useful trade-off between speed and space for a program that refers more frequently to data items than to code.
The compact memory model provides only one segment for code but multiple segments for data, each limited to 64K. By default, code items in compact-model programs are accessed with near addresses and data items are accessed with far addresses. The compact model provides a useful trade-off between speed and space for a program that refers more frequently to code items than to data.
The large memory model provides multiple segments, as needed, for both code and data. Each segment is limited to 64K: no one data item can exceed 64K. By default, both code and data items in large-model programs are accessed with far addresses.
The huge memory model is nearly identical to the large model, the only significant difference being that the huge model permits individual arrays to exceed 64K in size. Although the huge model lifts the limits on arrays, some size restrictions do apply. To maintain efficient addressing, no individual array element is allowed to cross a segment boundary. This has the following implications:




The Last Word

As you may have guessed, QuickBasic uses the medium memory model within conventional memory. Huge arrays are enabled by the /AH option which permits both the IDE and the compiler to use far addressing.
While this article may be as dull as dry toast, I hope QuickBasic fans will find some benefit in slogging through it. Hey, at least now you know why 64K is such a popular number!


C'ya,
RudeJohn
"I'm rude. It's a job."


RudeWare | Papers | Projects | Fragments | Tutorials