Reading a Huge CSV File

Runtime_error

Active member
Joined
Aug 5, 2003
Messages
38
:mad: Hi everyone

I have a 90mb CSV file which the application needs to read. I can do it using the OleDb or ODBC data classes but i am stuck on deciding what is the best way.

Forexample , when we read large amount of data in a Dataset, doesnt it just uses up a lot of the memory?

What is the best way to process a large amount of data from a source such as text files or csv files. Do we read the records line by line or do we read the entire file into a dataset or some buffer and then process it.


Please advice as to what the best approach would be. Thanks.
 
May vary. :p

Well ... it depends on what you have to do. Loading the whole file in memory will make futur access to it much more faster while reading line by line could cause some slowdown because of the "slow" HD ( a HD take MUCH more time to access a file than access a file in RAM). But the DataSet solution isnt the best in all situation...

Youll have to explain us a little bit what you are loading... and how will you use it.
 
;) its a standard CSV file, the size is around 90 mb and will soon be going up and up.

File Structure
"column1","column2","column3","column4"

So, if i go with line by line option, it will be slower.
Loading the entire file in a DATASET or any other object is not efficient as a lot of memory will be used at once.

So are there any other ways? What about File Streams? :confused:

AS for wat its for, well each line of the file will be processed and then written in a Sqldatabase.
 
I think along the lines of Arch4ngel, export to SQL Database and let it do the hard work... something that large you really shouldnt be trying to handle with File i/o; SQL is designed for that kind of load.
 
the ideal would be to change the file creating to a console app that writes to standard output, and have your app be a console app that reads from standard input. pipe the output from the former into the later.

barring that, follow the example here:
[mshelp="ms-help://MS.VSCC.2003/MS.MSDNQTR.2003FEB.1033/cpguide/html/cpconreadingtextfromfile.htm"]Reading Text from a File[/mshelp]
 
To follow my point of view... you should really consider doing it on SQL-S... something that big is considered "huge" in File I/O.

SQL-S is able to make it faster. It support SQL, Select, update, Delete...
Dont invent the wheel again buddy... if you dont use SQL-S... at least use MsAccess or MySql if you dont have any.

Im against the use of a CSV file of 90Meg and still growing...
 
lol sorry, i gave the wrong impression. The CSV file comes from another source which we dont have any control over.

Our system is Sql server 2000 based.

The client gets huge CSV files that the system needs to read and process. The manual process will be limited to just moving the files to a chosen directory and then Biztalk server and our application will do the rest.

Basically, the data that the client gets from another company needs to be processed and entered into our Sql Database. If it was just data from one source and in one standard format then it wouldnt be bad for someone to manually import the CSV files into the tables.

But the requirement is to Read the data and process it ( Merge it with data from other data sources ) , Enter it inot our Sql Database.
 
my suggestion is to create a temporary table (a table with name prefixed by #) with the layout of your data file, execute bulk insert against that and do your processing against the temp table.

for example. . .

c:\Data.txt contains:

0,"Value0",0
1,"Value1",0.000283687943262411
2,"Value2",0.000567375886524823
3,"Value3",0.000851063829787234

my code would look something like:
PHP:
	cmd.CommandText = "create table #importTable (i int, label varchar(255), n float)";
	cmd.ExecuteNonQuery();
 
	cmd.CommandText = "BULK INSERT #importTable from d:\data.txt " +
				" WITH (FIELDTERMINATOR  = , , KEEPNULLS)"
	cmd.ExecuteNonQuery()

then I would do what ever I needed to do with #importTable

remember #importTable will be dropped when the connection closes because it is temporary
 
Same problem

Hi all,

I am trying to accomplish the same thing, except the file isnt in CSV format.

Heres my case:
I have a 25mb file with 3000 employees record. Each record takes up exactly 5001 characters in the file, with information such as an employees name, address, phone number, salary, paycheck information etc. I get a file like that bi-weekly, which I need to import into an Oracle database through the ASP.NET app.

I will need to read the entire file, and from each 5001 characters to pick out the information I need, and call a Stored Procedure to save the infomration. The information will eventually be broken down into smaller chunks and stored into different tables (I have table for just personal information, another table for their tax information, a table for retirement plan information, a table for paycheck, etc.)

Right now Im doing the following:
1. Have an array to store the whole text file, with each block storing exactly 5001 characters, then
2. Use a Stringbuilder object to store one block of the array, then
3. Use smaller strings to save information I need from the stringbuilder, and
4. call Stored Procedures to save those information, and
5. go back to step 2 and repeat 3000 times.

This eats up a lot of RAM and sometimes even crashes my computer . I am sure the above method is a very stupid one...but Im a beginner programmer and this is the first time I work on an app with such intensive text-parsing.

Now I had to asks users to open the text file in Excel, save it as a spreadsheet, and have my application read it as a database so it reads one record at a time. This has caused inconvenience to them, and I really need an alternative solution to this...

The information in that text file isnt seperated by any symbols, so I am breaking it down solely by character position. Shall I still use bulk insert, put it into a temp table in Oracle, and break down the chunks from the database side?

And if I do break it down from the database, how can I tell (from the application) that the whole process is finished?

Thank you...
 
Loading text files to SQL server using DTS

.Being myself basically lazy, and adverse to writing any more "grunt" code than absolutely necessary, I suggest you use DTS and load the data into a "work-in-progress" (WIP) table in Sql Server.

DTS gives for free all the stuff you would otherwise write such as the ability to define the file source, format, layout, select the fields to keep, create and load the data.

I might suggest (for fun and experience) that you write a handler that uses the DTS system programmatically and then does any post processing you want.
heres how:
http://sqldev.net/DTS/DotNETCookBook.htm
http://www.c-sharpcorner.com/Code/2002/Sept/DTSnCS.asp
http://www.15seconds.com/issue/030909.htm
 
Back
Top