QuickBASIC/QBASIC newsletter

 
Tricks of the Trade
 
File Compression
by Danny Gump
 
The following is the decriptor for the VirtuaSoft Implode compression
algorithm that will be implemented into VirtuaSoft's upcoming video format.  I'm still
working on the first video, so I'm not sure at this time how this algorithm compares
with the ones used in MOV, AVI, and GIF files.
 
The compressed files are split into "chunks" of data, each containing a
header, the compressed data, and a trailer.  The compressed program itself also
contains a header and a trailer.  The following graphically shows how a compressed
file looks:
 
Header
Chunk1
Header
Data
Trailer
 
Chunk2
Header
Data
Trailer
 
Trailer
 
This is how the data would be declared for the VS Implode format:
 
DIM Header AS STRING * 2
DIM ChunkHeader AS STRING * 12
DIM Trailer AS STRING * 2
 
The VS Implode header is "VS."  Each chunk header tells the 12
byte name of the compressed file.  The trailer for both a chunk and for the
file is 0000h, or CHR$(0)+CHR$(0).  Here's how the file will look with all
the headers and trailers in place:
 
 
"VS"
FileName.bas
(Data)
0000h
FileName.exe
(Data)
0000h
0000h
 
Now for the tough part: the compression algorithm!
The compression algoithm basically has two checks:
 
1. Is the consecutive data the same?
2. Is the consecutive data different?
 
If the data is the same, it can be compressed, but if it is
not the same, it will have to remain the way it is.  The data is
compressed, if possible, to two bytes: the number of consecutive
bytes and the byte value.  If the data differs, two bytes are needed
to tell the compressor the data differs.  These bytes are 00h and
the number of differing bytes followed byte a listing of the values.
If the 00h is followed by another 00h, that chunk is terminated.
 
Uncompressed Data
Compressed Data
01h 01h 01h 01h 01h 01h 01h
07h 01h
01h 02h 03h 04h
00h 04h 01h 02h 03h 04h
 02h 02h 02h 02h 02h 02h [end]
06h 02h 00h 00h
 
If a 0000h is followed by another 0000h, the program ends.
Note that a sequence of >255 differing or like values will be split into
separate sequences.  For example, there is no 01FEh 01h.  That will be
compressed as FEh 01H FEh 01h.
 
Below is the code for decompressing.  I'll eventually release
the source for the compression, but it will not be for a few more weeks.
Look for VS Implode on my site.
 
DIM Value AS STRING * 1, Value2 AS STRING * 1
DIM Header AS STRING * 2
DIM ChunkHeader AS STRING * 12
DIM Trailer AS STRING * 2

PRINT
PRINT "VirtuaSoft Exploder v1.0a"
IF COMMAND$ = "" THEN
    PRINT "     Syntax:  exploder filename.vs"
    PRINT
    END
END IF
OPEN COMMAND$ FOR BINARY AS #1
GET #1, , header
IF header="VS" THEN
    DO
        GET #1, , ChunkHeader
        IF MID$(ChunkHeader, 1, 2) = CHR$(0)+CHR$(0) THEN EXIT DO
        PRINT "Exploding "; Chunkheader
        OPEN ChunkHeader FOR BINARY AS #2
        DO
            GET #1, , Value
            If Value = CHR$(0) THEN
                GET #1, ,Value
                IF Value = CHR$(0) THEN EXIT DO
                FOR temp% = 1 TO ASC (Value)
                    GET #1, ,Value
                    PUT #2, , Value
                NEXT
            ELSE
                GET #1, , Value2
                FOR temp% = 1 TO ASC (Value)
                    PUT #2, , Value2
                NEXT
            END IF
        LOOP
        CLOSE #2
    LOOP
END IF
PRINT

 
 
Back