T
tempc
Guest
Hi,
I have a UTF-8 file(File1), which contains the following data when viewing it in hexidecimal editor UltraEdit:
0x20 0xE2 0x80 0x93 0x3C 0x2F
However, my program is Unicode-based, so I need to read the contents and convert them into Unicode string.
I use the following code:
CFile File;
CByteArray Data;
if (File.Open(strFileName, CFile::modeRead | CFile::shareDenyWrite | CFile::typeBinary))
{
// Prepare the data buffer
Data.SetSize((INT_PTR)File.GetLength());
::memset(Data.GetData(), 0, Data.GetSize());
// Read all data to the buffer
File.SeekToBegin();
File.Read(Data.GetData(), Data.GetSize());
File.Close();
// Convert the buffer to the string
strData = CString((LPCSTR)Data.GetData(), Data.GetSize());
return TRUE;
}
else
return FALSE;
Then after getting the strData, to check if the data are correct, I try to save the strData into another new Unicode file(File2), as below:
CFile File;
if (File.Open(strNewFileName, CFile::modeCreate | CFile::modeWrite | CFile::shareExclusive | CFile::typeBinary))
{
// Write the strData to the file
File.Write((LPCTSTR)strData, strData.GetLength() * sizeof(TCHAR));
File.Close();
}
After opening the new Unicode file, I see the Unicode characters as below:
0x20 0x00 0x25 0x92 0x3F 0x00 0x2F 00
However, if I open File1 in UltraEdit directly, then using its menu function "Conversion" -> "UTF8->Unicode" feature. I will get a new file(File3), in File 3, the Unicode characters will be:
0x20 0x00 0x13 0x20 0x3C 0x00 0x2F 00
I find File2 and File3 have some different bytes, though they are both Unicode files. Therefore, I just wonder if my conversion has some problems?
Thanks
Continue reading...
I have a UTF-8 file(File1), which contains the following data when viewing it in hexidecimal editor UltraEdit:
0x20 0xE2 0x80 0x93 0x3C 0x2F
However, my program is Unicode-based, so I need to read the contents and convert them into Unicode string.
I use the following code:
CFile File;
CByteArray Data;
if (File.Open(strFileName, CFile::modeRead | CFile::shareDenyWrite | CFile::typeBinary))
{
// Prepare the data buffer
Data.SetSize((INT_PTR)File.GetLength());
::memset(Data.GetData(), 0, Data.GetSize());
// Read all data to the buffer
File.SeekToBegin();
File.Read(Data.GetData(), Data.GetSize());
File.Close();
// Convert the buffer to the string
strData = CString((LPCSTR)Data.GetData(), Data.GetSize());
return TRUE;
}
else
return FALSE;
Then after getting the strData, to check if the data are correct, I try to save the strData into another new Unicode file(File2), as below:
CFile File;
if (File.Open(strNewFileName, CFile::modeCreate | CFile::modeWrite | CFile::shareExclusive | CFile::typeBinary))
{
// Write the strData to the file
File.Write((LPCTSTR)strData, strData.GetLength() * sizeof(TCHAR));
File.Close();
}
After opening the new Unicode file, I see the Unicode characters as below:
0x20 0x00 0x25 0x92 0x3F 0x00 0x2F 00
However, if I open File1 in UltraEdit directly, then using its menu function "Conversion" -> "UTF8->Unicode" feature. I will get a new file(File3), in File 3, the Unicode characters will be:
0x20 0x00 0x13 0x20 0x3C 0x00 0x2F 00
I find File2 and File3 have some different bytes, though they are both Unicode files. Therefore, I just wonder if my conversion has some problems?
Thanks
Continue reading...