Generally files of a certain type contain a header. For example, a .GIF file contains a header that says "GIF87a" or "GIF89a" so that programs can determine that it is a GIF.
All files can be interpreted as text or interpreted as binary.
You cant for sure, thats the whole point. Each of the encoding types has its own byte-level signature, and some offer bit order marks, however theres nothing mandating that signature to only be used in text files. This is part of the reason why file extensions exist-- to indicate what type of file is being dealt with.
For example:
ASCII uses 7 bits to represent one character.
UTF-8 uses one to six octets per character, with the initial octet serving as both an indicator of the number of subsequently used octets and a portion of the character value. UTF-8 is also marked with an opening byte sequence of EF BB BF.
And while UTF-8 is relatively easy to spot, theres absolutely nothing to distinguish ASCII with other than by checking whether or not there are bytes in the file that dont map to characters. A null value, 00, would be one indication that the file is not ASCII-encoded for example.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.