EDN Admin
Well-known member
Im processing large binary files. These are PCL files, and Im looking for page boundaries. I want to store the position of each Form Feed, which in PCL is decimal 12, hex 0C. However, that byte can also exist as part of a raster or other binary structure.
So I loop through the file, and when I find a "12", I read ahead 14 bytes to compare them to a known string. If I get a match, I know the 12 was a real Form Feed, and I store its position in an ArrayList.
This works fine using a FileStream object and its ReadByte() method and .Position property. The problem is it is very slow. Id like to use a StreamReader to take advantage of buffering. However, when I use a StreamReader, the FileStreams Position property points to the amount thats been buffered, not the actual file position.
So my question is, how can I have the speed of StreamReader, but still maintain an accurate file position?
Sample code, the StreamReader Version. Hopefully, someone can suggest a change that would report the "virtual" file position of the "current byte", rather than the current file position reached through buffering.
<table border="0" cellspacing="0" width="100% <tr><td width="15 </td><td bgcolor="lightgrey" width="15 </td><td bgcolor="lightgrey
<font face="Lucida Console" size="2
<font color= "blue using</font> System;
<font color= "blue using</font> System.IO;
<font color= "blue using</font> System.Text;
<font color= "blue using</font> System.Collections; <font color= "blue namespace</font> pcl_proc
{
<font color= "green /// <summary>
/// Summary description for Class1.
/// </summary>
class Class1
{
[STAThread]
static void Main(string[] args)
{
ArrayList page_positions = new ArrayList();
ArrayList page_type = new ArrayList();
string asciiString; string bgn_of_page = " &l8c1E *p0x0Y";
string header; long curr_pos; int pcl_char; char[] test; string filename = @"C:Statements-05-03-05.pcl";
FileStream infile = new FileStream(filename, FileMode.Open, FileAccess.Read);
StreamReader input = new StreamReader(infile); // need to initialize header and position of first page. test = new char[1024]; input.Read(test, 0 , test.Length);
asciiString = new String(test); header = asciiString.Substring(0,asciiString.IndexOf("*b0M") + 4);
page_positions.Add(header.Length);
page_type.Add("B"); while (input.Peek() >= 0 )
{
pcl_char = input.Read();
if (pcl_char == 12)
{
test = new char[14]; // this next line doesnt record the accurate position
// of the "12" found by the input.Read().
// How can I get the actual position?
curr_pos = infile.Position; input.Read(test, 0, test.Length); asciiString = new string(test); if (asciiString == bgn_of_page)
{
page_positions.Add(curr_pos);
} // if (new string(test) == bgn_of_page)
} // if (pcl_char == 12)
} // while (sr.Peek >= 0) infile.Close();
}
}
}
</font></font>
</td></tr></table>
Note: the "bgn_of_page" string is actually 14 bytes, the forum stripped out the two "escape" characters. I mention this in case anyone wonders why Im reading 14 bytes and comparing it to a 12 byte string.
Note: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemiostreamreaderclassbasestreamtopic.asp, contains this enigmatic statement:
"<b>StreamReader</b> might buffer input such that the position of the underlying stream will not match the <b>StreamReader</b> position." Yes, thats right. But they offer no method or example to deal with that situation. Also, they refer to "the StreamReader position". Well, what is the StreamReader position? How do I find it? What method or property returns it?
View the full article
So I loop through the file, and when I find a "12", I read ahead 14 bytes to compare them to a known string. If I get a match, I know the 12 was a real Form Feed, and I store its position in an ArrayList.
This works fine using a FileStream object and its ReadByte() method and .Position property. The problem is it is very slow. Id like to use a StreamReader to take advantage of buffering. However, when I use a StreamReader, the FileStreams Position property points to the amount thats been buffered, not the actual file position.
So my question is, how can I have the speed of StreamReader, but still maintain an accurate file position?
Sample code, the StreamReader Version. Hopefully, someone can suggest a change that would report the "virtual" file position of the "current byte", rather than the current file position reached through buffering.
<table border="0" cellspacing="0" width="100% <tr><td width="15 </td><td bgcolor="lightgrey" width="15 </td><td bgcolor="lightgrey
<font face="Lucida Console" size="2
<font color= "blue using</font> System;
<font color= "blue using</font> System.IO;
<font color= "blue using</font> System.Text;
<font color= "blue using</font> System.Collections; <font color= "blue namespace</font> pcl_proc
{
<font color= "green /// <summary>
/// Summary description for Class1.
/// </summary>
class Class1
{
[STAThread]
static void Main(string[] args)
{
ArrayList page_positions = new ArrayList();
ArrayList page_type = new ArrayList();
string asciiString; string bgn_of_page = " &l8c1E *p0x0Y";
string header; long curr_pos; int pcl_char; char[] test; string filename = @"C:Statements-05-03-05.pcl";
FileStream infile = new FileStream(filename, FileMode.Open, FileAccess.Read);
StreamReader input = new StreamReader(infile); // need to initialize header and position of first page. test = new char[1024]; input.Read(test, 0 , test.Length);
asciiString = new String(test); header = asciiString.Substring(0,asciiString.IndexOf("*b0M") + 4);
page_positions.Add(header.Length);
page_type.Add("B"); while (input.Peek() >= 0 )
{
pcl_char = input.Read();
if (pcl_char == 12)
{
test = new char[14]; // this next line doesnt record the accurate position
// of the "12" found by the input.Read().
// How can I get the actual position?
curr_pos = infile.Position; input.Read(test, 0, test.Length); asciiString = new string(test); if (asciiString == bgn_of_page)
{
page_positions.Add(curr_pos);
} // if (new string(test) == bgn_of_page)
} // if (pcl_char == 12)
} // while (sr.Peek >= 0) infile.Close();
}
}
}
</font></font>
</td></tr></table>
Note: the "bgn_of_page" string is actually 14 bytes, the forum stripped out the two "escape" characters. I mention this in case anyone wonders why Im reading 14 bytes and comparing it to a 12 byte string.
Note: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemiostreamreaderclassbasestreamtopic.asp, contains this enigmatic statement:
"<b>StreamReader</b> might buffer input such that the position of the underlying stream will not match the <b>StreamReader</b> position." Yes, thats right. But they offer no method or example to deal with that situation. Also, they refer to "the StreamReader position". Well, what is the StreamReader position? How do I find it? What method or property returns it?
View the full article