In article <address@hidden>, Jason Rumney <address@hidden> writes:
Dave Love <address@hidden> writes:
Yes. Perhaps someone knows exactly what Windows does (assuming the
only significant use of it is in Windows)?
I would guess that the presence of a BOM is sufficient
heuristics. Detecting 0 or other low byte values every second
byte would work for Latin script based languages, but I don't think
any heuristic like that would work on Asian text unless you could
assume a specific language and use a dictionary.
I think BOM is not that safe because there are many charsets
who have normal letters at 0xFE and 0xFF.
But what are those characters, and are they likely to appear as a pair
at the beginning of the file, and nowhere else?