When is a file binary?

gdocter

New member
Joined
Jul 21, 2003
Messages
4
Is there a way to determine whether a file was written in binary format? All I can find is text encoding; not non-text encoding..
 
A file is always binary. If the file was ANSI/Unicode/UTF encoded, its still binary, however it is interpreted as text.
 
re: binary

Thanks, I knew that ;) but .. let me rephrase:

How do you determine, then, that no encoding was used whatsoever ..?
 
Generally files of a certain type contain a header. For example, a .GIF file contains a header that says "GIF87a" or "GIF89a" so that programs can determine that it is a GIF.

All files can be interpreted as text or interpreted as binary.
 
You cant for sure, thats the whole point. Each of the encoding types has its own byte-level signature, and some offer bit order marks, however theres nothing mandating that signature to only be used in text files. This is part of the reason why file extensions exist-- to indicate what type of file is being dealt with.

For example:

  • ASCII uses 7 bits to represent one character.
  • UTF-8 uses one to six octets per character, with the initial octet serving as both an indicator of the number of subsequently used octets and a portion of the character value. UTF-8 is also marked with an opening byte sequence of EF BB BF.

And while UTF-8 is relatively easy to spot, theres absolutely nothing to distinguish ASCII with other than by checking whether or not there are bytes in the file that dont map to characters. A null value, 00, would be one indication that the file is not ASCII-encoded for example.
 
Back
Top