EDN Admin
Well-known member
<P>What is the correct way to write utf8 byte order markers (BOM) to a UTF file with WriteFile ?</P>
<P>I tried this...</P>
<P>TCHAR* sbuffer = TEXT("Some text in a utf8 file");</P><FONT size=2>
<P></FONT><FONT color=#0000ff size=2>char</FONT><FONT size=2> * smarker = (</FONT><FONT color=#0000ff size=2>char</FONT><FONT size=2> *) malloc(4);
</FONT><FONT size=2>smarker[0] = 0xEF;
smarker[1] = 0xBB;
smarker[2] = 0xBF;
smarker[3] = 0x00;
WriteFile(hFile, smarker, 3, &dwBytesWritten, NULL); // write the bom
free(smarker);
</FONT><FONT size=2>WriteFile(hFile, sbuffer, (_tcslen(sbuffer) + 1) * </FONT><FONT color=#0000ff size=2>sizeof</FONT><FONT size=2>(TCHAR), &dwBytesWritten, NULL); // write the data + null</P></FONT>
<P>When the file opens in notepad the BOM does not display (good), but each character has a space showing after it. ie..</P>
<P>"S o m e T e x t ...."</P>
<P>My original need to do this was: If I dont write the boms, the file can be read ok from scripting.filesystemobject, but when I try to read the file with .nET streadreader, the framework attempts to discover the unicode type by the BOMs and the lack of BOMS means a valid unicode file gets read into a string buffer as a screwed up x 0 x 0 x 0 ascii string.</P>
<P> My only objective here is to write a standard UTF8 encoded unicode file from a TCHAR string. There must be an easy way ?,,,,.... ??? ,.,,</P>
<P> </P>
View the full article
<P>I tried this...</P>
<P>TCHAR* sbuffer = TEXT("Some text in a utf8 file");</P><FONT size=2>
<P></FONT><FONT color=#0000ff size=2>char</FONT><FONT size=2> * smarker = (</FONT><FONT color=#0000ff size=2>char</FONT><FONT size=2> *) malloc(4);
</FONT><FONT size=2>smarker[0] = 0xEF;
smarker[1] = 0xBB;
smarker[2] = 0xBF;
smarker[3] = 0x00;
WriteFile(hFile, smarker, 3, &dwBytesWritten, NULL); // write the bom
free(smarker);
</FONT><FONT size=2>WriteFile(hFile, sbuffer, (_tcslen(sbuffer) + 1) * </FONT><FONT color=#0000ff size=2>sizeof</FONT><FONT size=2>(TCHAR), &dwBytesWritten, NULL); // write the data + null</P></FONT>
<P>When the file opens in notepad the BOM does not display (good), but each character has a space showing after it. ie..</P>
<P>"S o m e T e x t ...."</P>
<P>My original need to do this was: If I dont write the boms, the file can be read ok from scripting.filesystemobject, but when I try to read the file with .nET streadreader, the framework attempts to discover the unicode type by the BOMs and the lack of BOMS means a valid unicode file gets read into a string buffer as a screwed up x 0 x 0 x 0 ascii string.</P>
<P> My only objective here is to write a standard UTF8 encoded unicode file from a TCHAR string. There must be an easy way ?,,,,.... ??? ,.,,</P>
<P> </P>
View the full article