Extremely Puzzling - Raw text file reading

  • Thread starter Thread starter Captain Kernel
  • Start date Start date
C

Captain Kernel

Guest
I'm working on processing CSV files, some of these are UTF-8 but others are simple "ASCII" no byte order mark prefixes or anything.

If I read the data as a stream and read the raw bytes (ReadByte) then in the case of the UTF-8 files I "see" every CR LF as a distinct character.

But if I read the plain ASCII file I do see the CR LF chars at the end of each line BUT if I insert a CR LF in the middle of a record (e.g. in Notepad++ just put cursor partway along a record and press ENTER) then these inserted CR LF chars are not seen.

Somehow the reading mechanism (Stream.ReadByte) seems to know that the originally present CR LF are real and it returns them but the inserted CR LF chars are to be ignored.

I can see clearly in Neo Hex Editor these CR LF bytes and as a sequence of chars it looks as expected but I cannot read these chars, the "real" CR LF chars are seen but the one's I inserted are not.

So for example if I begin with this simple text

AAAAAAAAAA\r\n - we can see and read the \r and \n chars.

If I edit this to be:

AA\r\n

AAAAAAAA\r\n

(and save the file)

Then we see a sequence of 10 'A's followed by a \r and \n - we never see the preceding \r\n - but in the hex editor I see no reason at all for this.

Can anyone explain this?

Continue reading...
 
Back
Top