Binary File Reads Code Optimization

booler

New member
Joined
Aug 19, 2005
Messages
4
Binary File Reading Code Optimization

Hi guys,

I am having a problem with my binary file reading, and wonder if anybody knows a better way to achieve what I am getting at. I am trying to read in a binary database file record by record. Each record is split into fields, and each record contains different data types (which are known at runtime). I have to cast each data field to an appropriate .NET type, and perform a calculation on each one. So far, so good, except the performance is not what I had hoped.

In the database there are around 50 million records, and each is 32 bytes. I need to complete the full read in less than 45 seconds- and so far I cannot get it to run in less than 150 seconds.

I am reading the fields like this (binary reader is already assigned):

Code:
Public Function Read() As Boolean
If Me.cursor >= Me.recordcount Then
    Return (False)
End If

Try
    instantiate custom structure to hold byte array for record
    Me.currentRecord = New DataRecord(Me.recordsize)
    
    buffer holds System.Collections.Queue containing next 100 records
    If Me.buffer.Count = 0 Then
        Me.RefillBuffer(Me.buffer)
    End If
 
    assign byte array inside custom structure to current record by pulling
    next byte array from queue
    Me.currentRecord.data = CType(Me.buffer.Dequeue, Byte())

    increment record counter
    Me.cursor += 1
    Return (True)
Catch ex As Exception
    Throw New System.Data.DataException("File is not accessible.")
End Try
End Function

So the idea is there is a custom structure which points to the current record, and a queue which reads and holds 100 records which is incrementally dequeued, and then refilled. This is the code for the queue refilling:

Code:
Public Function RefillBuffer(ByRef buffer As Queue)
For i as integer = 0 To 99
   add record to queue if records remaining
   If Me.currentfillpointer < Me.recordcount Then
      buffer.Enqueue(CType(Me.dbfReader.ReadBytes(Me.recordsize), Byte()))
      Me.currentfillpointer += 1
   Else
      Exit For
   End If
Next
End Function

And finally this is the code for the custom structure that holds the data for indvidual records:

Code:
Public Structure FoxproDataRecord
   Public data As Byte()
   Private length As Integer

   constructor to pass in record length
   Public Sub New(ByVal dataLength As Integer)
      length = dataLength
      data = New Byte(dataLength) {}
   End Sub
End Structure

The actual data casts are running reasonably quickly, but the data reading is just not fast enough. Does anyone have any ideas on how I can speed this up?

Thanks,

Adam
 
Last edited by a moderator:
Check out Binary Serialization. Its probably faster and will do most of the work for you.

Best thing since random access files.
 
Diesel said:
Check out Binary Serialization. Its probably faster and will do most of the work for you.

Best thing since random access files.

Hi!

Thanks for the reply.

I have had a look at the BinaryFormatter class- is this what you mean?

As far as I can see, it it has one deserialize method to which you pass a filestream object. However, I cannot deserialize the whole file without using some kind of buffer because the file is 2Gb. Do you know of any way to deserialize a file in smaller pieces?

I can see that this approach could be quick if I was able to create something like a custom structure to cast the returned data to. My other problem with this is that, although the data structure is known at runtime, it is not known at design time, so this limits my options in terms of constructing a custom container for the data. Do you have any ideas how I might get around this?

Thanks for your help,

Adam
 
The deserialise method accepts a stream as a parameter and will deserialise the next object at the current file location - it doesnt attempt to deserialise the entire file in one go.
If the structures are at known boundaries (seems to be the case if they are all 32 bytes long), you could read a chunk of the file in to a byte array and process that - then read the next chunk and so forth.
 
Back
Top