Is there a faster way

kcwallace

Well-known member
Joined
May 27, 2004
Messages
172
Location
Irvine, CA
I have a flat file that can contain up to 20000 lines. Each line represents a record that needs parsed and added to a database. I have code that works, however, I was wondering if there was a faster way to add the data to a database.

Currently I am looping throuhg each line and adding each rows data executing a command. I have also tried updating the database using a dataset and dataadapter. Both work.

Is there a better way that I am missing?
 
Going faster can mean lots of things. Let us know what youre currently doing so we can offer more help. Here are just a few things that may come up:
1. DataSets with DataAdapters will be slower than straight Command objects (using ExecuteNonQuery for example).
2. Parsing the file with Substring will be faster than using Regular Expressions.
3. If youre using SQL Server 2000 maybe you can use bcp or DTS (Bulk Copy Program and Data Transformation Services).
4. If youre using SQL Server 2005 maybe you can use Integration Services (the new version of DTS that lets you write a Visual Studio application hosted inside of SQL Server).
5. If youre using .NET 2.0 with SQL Server you may be able to use the new SqlBulkCopy object.

If you just want tweaks to what you currently have, let us see some of the code youre using so we can offer specific help.

-ner
 
The Current Code:

I know I can tweak it more by using parameterized update statements, but that is just a tweak.

What are "Bulk Copy Program and Data Transformation Services" and "Parsing the file with Substring"?

I am using VS2005 with SQL Server 2000

Code:
While Not input Is Nothing
                    PB.Maximum = RowCnt
                    Cnt += 1
                    strCnt = 1
                    WriteToLog(FileLog.Current, "Evaluating the Length of the Line.  Len(input)=" & Len(input))
                    If Len(input) > 15 Then
                        SQL = "INSERT INTO badges (AssmDate, Acc, Loc, WearNum, WearDate, UID, PID, BT, BR, BP, Freq, WearPeriod, Daily, WearerName, BU, BadgeID, WhoAssm, FName) VALUES ("
                        FileTmplt(X) is an array containing the structure of the datafile. I.e., AssmDate= character 1-8
                        For X = 1 To UBound(FileTmplt)
                            If DTypeArr(X) = 1 Then string field
                                TmpVal = Mid(input, strCnt, FileTmplt(X))
                                TmpVal = ADO.RepSngQut(TmpVal) removes all single quotes from the field data
                                SQL = SQL & "" & TmpVal & ", "
                            ElseIf DTypeArr(X) = 3 Then date field
                                TmpVal = Mid(input, strCnt, FileTmplt(X))
                                TmpVal = "" & Mid(TmpVal, 5, 2) & "/" & Mid(TmpVal, 7, 2) & "/" & Mid(TmpVal, 1, 4) & ""
                                SQL = SQL & TmpVal & ", "
                            Else number
                                SQL = SQL & Mid(input, strCnt, FileTmplt(X)) & ", "
                            End If
                            strCnt += FileTmplt(X)
                        Next X
                        SQL = SQL & "" & FileSrch & ")"
                        ADO.ActionSQL(SQL, Cnn) creates a SQLCommand and performs an ExecuteNonQuery
                        WriteToLog(FileLog.Current, "Reading Line")
                    End If
                    input = sr.ReadLine()
                    If Not input Is Nothing Then
                        WriteToLog(FileLog.Current, input)
                    Else
                        WriteToLog(FileLog.Current, "End of File")
                    End If
                    Label2.Text = "Loaded " & Cnt & " of " & RowCnt & " Records"
                    PB.Value = Cnt
                    Me.Refresh()
                    Application.DoEvents()
                End While
 
Last edited by a moderator:
Ah, code! First the questions you asked about...
You are already doing "substring" parsing (you use Mid(), a VB6 leftover). Some people define a line from their flat file in terms of Regular Expressions (using the RegEx object) and let that object do the parsing. Its very readable code (maybe not the regular expression itself), but slower than finding chunks of a string using Mid.

BCP is the fastest way to get data into the database. bcp is a command line tool for SQL Server that takes an input file and a layout file (defines the fields, the delimiter, whether strings have quotes on them, etc.) and sucks the file into the table - VERY fast.

DTS is the next best thing. In fact, if you design DTS correctly it will use bcp behind the scenes. It does allow you to "transform" the data, by sticking a piece of VBScript in the middle. In other words, it will generate a default script for you along the lines of:
Code:
Sub Main()
    DTSDestination("Field1") = DTSSource("Field1")
    DTSDestination("Field2") = DTSSource("Field2")
End Sub

In there you can write your own logic, including subroutines and more. DTS is also very fast and allows you to tweak the reading/writing. For small to medium sized complexity, Id use DTS over a custom EXE.

Since youre using VS2005, you may be able to use the new SqlBulkCopy object. I admit I havent had a chance to try it out yet, but it supposedly lets you write your VB.NET application as you want, and take advantage of BCP for speed.



Now, for your sample code:
One of the first things I would do is change all the string variables to use StringBuilder object. Every time you concatenate a string, it must allocate space for the new string and then copy the old string plus the new to the new location. Heres your code with some "dings" for each string copy:
Code:
 Ding 1 for the ReadLine() method
If Len(input) > 15 Then
     Ding 2
    SQL = "INSERT INTO badges (..."
    For X = 1 To UBound(FileTmplt)
        If DTypeArr(X) = 1 Then
             Ding 3
            TmpVal = Mid(input, strCnt, FileTmplt(X))
             Ding 4 (maybe more, inside of RepSngQut)
            TmpVal = ADO.RepSngQut(TmpVal)
             Ding 5 (I think only one ding, although multiple strings are concatenated)
            SQL = SQL & "" & TmpVal & ", "
        Else  ...
        End If
    Next X
     Ding 6
    SQL = SQL & "" & FileSrch & ")"
    ADO.ActionSQL(SQL, Cnn)
    WriteToLog(FileLog.Current, "Reading Line")
End If

As the example shows, there are 6 dings (string copies) for each line in the file. If your file is 10 meg (relatively small for some jobs), thats like reading and copying around 60 meg of data.

Before making any changes, Id do some profiling to see how things look before/after. Some changes may help, some may not.

As you pointed out, you may also try using SqlParameter objects - you only have to build the collection of parameters one time, then reset the values on each loop.

You may also try the good old do-it-yourself-batching. That is, save up 20 to 50 of the final SQL statements into a string and execute them all at once. So the string will look like:
Code:
INSERT INTO badges (AssmDate, Acc, Loc, WearNum...
INSERT INTO badges (AssmDate, Acc, Loc, WearNum...
INSERT INTO badges (AssmDate, Acc, Loc, WearNum...

Then one call to ADO.ActionSQL(SQL, Cnn) to commit them all. May take a little extra work for logging, to handle problems with failed logs (like a separate routine that loops through the big string and commits each INSERT one at a time in case of bad data).

-ner
 
Back
Top