Trouble reading CSV files with QBASIC/FreeBasic
Posted: Tue Apr 12, 2005 8:02 pm
I am in the middle of a large project involving simple manipulation of a large number of files (related to stock market ticker symbols). I first read the raw CSV files and extract summary information and which I write to new CSV files. I have a massive amount of data to read in and process. Order of magnitude: 2,000,000 files / 15 GB of data in the form of small CSV files. The first set of QBASIC programs read the data and created summary CSV files. Much of the code is not shown but I used the following lines to write the summary files. I believe that I have set up correct DIM statements for all variables and am pretty good about using data type operators (%,$,# etc) on all my variables in the code:
OPEN FILENAMESAVE$ FOR APPEND AS #2
WRITE #2, "DATA", "INDEX", "CS", "MKTCAP", "MKTCAPMA", "CUMCALLOI", "CUMPUTOI", "PutCall", "PC SD OI", "PC MA OI"
WRITE #2, "DATE", "DATE", "PRICE", "999", "10", "CUMCALL", "CUMPUT", "1", "2", "10"
WRITE #2, ""
Above code writes a 3 line header to a new file. Code below adds a line for summary data from each file for a given stock ticker:
WRITE #2, DATADATE$, INDEXDATE$, CSPRICE$, MKTCAP$, "1", FINALCALLOI$, FINALPUTOI$, FINALPUTCALLOI$, "2", "3"
If I open the summary file in Windows Notepad I see the following:
"DATA","INDEX","CS","MKTCAP","MKTCAPMA","CUMCALLOI","CUMPUTOI","PutCall OI","PC SD OI","PC MA OI"
"DATE","DATE","PRICE","0.086778","10","CUMCALLOI","CUMPUTOI","1","2","10"
""
"2/5/2002 4:00","20040205","6.23","0.540626941655159","1","3122.825043682009","377.5000035464764","0.12088414","2","3"
"2/6/2002 4:00","20040206","6.1","0.5293457917242049","1","2792.900016859174","398.7999981790781","0.14279065","2","3"
etc.
etc.
etc.
etc.
My problem occurs in the next phase of the overall project - taking these summary files and doing something with them. I am trying to read the files (using a completely different QBASIC program) with the following code snippet:
FOR f = 1 TO TOTALFILECOUNT%
OPEN FILENAMELIST$(f) FOR INPUT AS #1
FOR i = 1 TO 800
LINE INPUT #1, WorkingLineArray$(i)
If i<4 Then Goto SKIPEND
CHAR1$ = LEFT$(WorkingLineArray$(i),1)
IF CHAR1$ = "" THEN EXIT FOR
SKIPEND:
NEXT
CLOSE #1
FINALROWNUMBER% = i-1
OPEN FILENAMELIST$(f) FOR INPUT AS #1
FOR i = 1 TO FINALROWNUMBER%
FOR j = 1 TO 10
INPUT #1, WorkingArray$(i, j)
NEXT
NEXT
CLOSE #1
FOR i = 1 TO FINALROWNUMBER%
FOR j = 1 TO 10
PRINT "i ";i;" j ";j;" cell= ";WorkingArray$(i, j)
sleep 250
NEXT
NEXT
The first loop determines the last row of FILENAMELIST(f) - for some reason DO WHILE NOT EOF(1) does not work - probably for same reason that I am doing this posting. The second loop reads data into WorkingArray$, and third displays data as a debugging test. What I found is that my program reads the next field every two (i,j) reads. By this I mean that in reading my CSV dump above I would get the following results in my print for debugging loop:
1 j value
-- -- ------
1 1 DATA
1 2
1 3 INDEX
1 4
1 5 CS
1 6
1 7 MKTCAP
1 8
1 9 MKTCAPMA
However, when I edited the CSV input file in Windows Notepad and took out all the quotes (") the file ran like it should. I tried to change my data generation process to eliminate the quotes from the program that generated the Data CSV file but QBASIC balked about "expected expression" - it mistook my unquoted strings as variables without definition.
My data files are so large and numerous that I have written the original code in QBASIC and then compiled and ran in FreeBasic to get around the data array size limitations. These results are all from the FreeBasic compiled runs. It will take me several hours to create special data and recode the program to test whether QBASIC can handle it correctly (and thus determine if the problem is FreeBasic or my coding). I am hoping that one of the forum members has enough experience with both QBASIC and FreeBasic to give me some help.
I could manually fix the quotes in my summary data files (3500 total) but as new files are created on an ongoing basis this is not a workable solution. I cannot figure out why it skips every other field with quotes and works with no quotes. For what it is worth, the LINE INPUT loop works fine even with the data with quotes.
Any ideas??
Thanks
dbish
OPEN FILENAMESAVE$ FOR APPEND AS #2
WRITE #2, "DATA", "INDEX", "CS", "MKTCAP", "MKTCAPMA", "CUMCALLOI", "CUMPUTOI", "PutCall", "PC SD OI", "PC MA OI"
WRITE #2, "DATE", "DATE", "PRICE", "999", "10", "CUMCALL", "CUMPUT", "1", "2", "10"
WRITE #2, ""
Above code writes a 3 line header to a new file. Code below adds a line for summary data from each file for a given stock ticker:
WRITE #2, DATADATE$, INDEXDATE$, CSPRICE$, MKTCAP$, "1", FINALCALLOI$, FINALPUTOI$, FINALPUTCALLOI$, "2", "3"
If I open the summary file in Windows Notepad I see the following:
"DATA","INDEX","CS","MKTCAP","MKTCAPMA","CUMCALLOI","CUMPUTOI","PutCall OI","PC SD OI","PC MA OI"
"DATE","DATE","PRICE","0.086778","10","CUMCALLOI","CUMPUTOI","1","2","10"
""
"2/5/2002 4:00","20040205","6.23","0.540626941655159","1","3122.825043682009","377.5000035464764","0.12088414","2","3"
"2/6/2002 4:00","20040206","6.1","0.5293457917242049","1","2792.900016859174","398.7999981790781","0.14279065","2","3"
etc.
etc.
etc.
etc.
My problem occurs in the next phase of the overall project - taking these summary files and doing something with them. I am trying to read the files (using a completely different QBASIC program) with the following code snippet:
FOR f = 1 TO TOTALFILECOUNT%
OPEN FILENAMELIST$(f) FOR INPUT AS #1
FOR i = 1 TO 800
LINE INPUT #1, WorkingLineArray$(i)
If i<4 Then Goto SKIPEND
CHAR1$ = LEFT$(WorkingLineArray$(i),1)
IF CHAR1$ = "" THEN EXIT FOR
SKIPEND:
NEXT
CLOSE #1
FINALROWNUMBER% = i-1
OPEN FILENAMELIST$(f) FOR INPUT AS #1
FOR i = 1 TO FINALROWNUMBER%
FOR j = 1 TO 10
INPUT #1, WorkingArray$(i, j)
NEXT
NEXT
CLOSE #1
FOR i = 1 TO FINALROWNUMBER%
FOR j = 1 TO 10
PRINT "i ";i;" j ";j;" cell= ";WorkingArray$(i, j)
sleep 250
NEXT
NEXT
The first loop determines the last row of FILENAMELIST(f) - for some reason DO WHILE NOT EOF(1) does not work - probably for same reason that I am doing this posting. The second loop reads data into WorkingArray$, and third displays data as a debugging test. What I found is that my program reads the next field every two (i,j) reads. By this I mean that in reading my CSV dump above I would get the following results in my print for debugging loop:
1 j value
-- -- ------
1 1 DATA
1 2
1 3 INDEX
1 4
1 5 CS
1 6
1 7 MKTCAP
1 8
1 9 MKTCAPMA
However, when I edited the CSV input file in Windows Notepad and took out all the quotes (") the file ran like it should. I tried to change my data generation process to eliminate the quotes from the program that generated the Data CSV file but QBASIC balked about "expected expression" - it mistook my unquoted strings as variables without definition.
My data files are so large and numerous that I have written the original code in QBASIC and then compiled and ran in FreeBasic to get around the data array size limitations. These results are all from the FreeBasic compiled runs. It will take me several hours to create special data and recode the program to test whether QBASIC can handle it correctly (and thus determine if the problem is FreeBasic or my coding). I am hoping that one of the forum members has enough experience with both QBASIC and FreeBasic to give me some help.
I could manually fix the quotes in my summary data files (3500 total) but as new files are created on an ongoing basis this is not a workable solution. I cannot figure out why it skips every other field with quotes and works with no quotes. For what it is worth, the LINE INPUT loop works fine even with the data with quotes.
Any ideas??
Thanks
dbish