CString convert from UTF-8 to Unicode

  • Thread starter Thread starter tempc
  • Start date Start date
T

tempc

Guest
Hi,

I have a UTF-8 file(File1), which contains the following data when viewing it in hexidecimal editor UltraEdit:

0x20 0xE2 0x80 0x93 0x3C 0x2F

However, my program is Unicode-based, so I need to read the contents and convert them into Unicode string.

I use the following code:

CFile File;
CByteArray Data;

if (File.Open(strFileName, CFile::modeRead | CFile::shareDenyWrite | CFile::typeBinary))
{
// Prepare the data buffer
Data.SetSize((INT_PTR)File.GetLength());
::memset(Data.GetData(), 0, Data.GetSize());

// Read all data to the buffer
File.SeekToBegin();
File.Read(Data.GetData(), Data.GetSize());
File.Close();

// Convert the buffer to the string
strData = CString((LPCSTR)Data.GetData(), Data.GetSize());

return TRUE;
}
else
return FALSE;

Then after getting the strData, to check if the data are correct, I try to save the strData into another new Unicode file(File2), as below:

CFile File;

if (File.Open(strNewFileName, CFile::modeCreate | CFile::modeWrite | CFile::shareExclusive | CFile::typeBinary))
{

// Write the strData to the file

File.Write((LPCTSTR)strData, strData.GetLength() * sizeof(TCHAR));

File.Close();

}

After opening the new Unicode file, I see the Unicode characters as below:

0x20 0x00 0x25 0x92 0x3F 0x00 0x2F 00

However, if I open File1 in UltraEdit directly, then using its menu function "Conversion" -> "UTF8->Unicode" feature. I will get a new file(File3), in File 3, the Unicode characters will be:

0x20 0x00 0x13 0x20 0x3C 0x00 0x2F 00

I find File2 and File3 have some different bytes, though they are both Unicode files. Therefore, I just wonder if my conversion has some problems?

Thanks

Continue reading...
 
Back
Top