[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Compiling Reverse Polish Calculator Example
From: |
Hans Aberg |
Subject: |
Re: Compiling Reverse Polish Calculator Example |
Date: |
Fri, 2 Nov 2012 00:28:29 +0100 |
On 1 Nov 2012, at 18:34, Akim Demaille wrote:
> Hi Hans,
>
> Le 31 oct. 2012 à 15:47, Hans Aberg a écrit :
>
>> It is pointless in UTF-8, and accepting it encourages a number of other
>> problems.
>> https://en.wikipedia.org/wiki/Byte_order_mark
>
> You are right that Bison wants at least to be able to read
> the ASCII part of the 8 bits, so that sort-of means UTF-8,
> if we consider that Latin 1 and the like are dead.
>
> If we were to ignore the BOM, then at least we should check
> that they match UTF-8, and reject the file otherwise?
>
> FWIW, the D compilers for instance obey these BOM, including for
> other codings than UTF-8.
Note that the OP had a UTF-16 BOM, which is just an error in UTF-8. The UTF-8
BOM is a 3-byte sequence; there is a FAQ here:
http://www.unicode.org/faq/utf_bom.html#BOM
I think it says that if one wants to recognize it in the middle of a file, just
ignore it.
But point three of the last issue says to not use it in streams that expect
starting with ASCII, like UNIX script '#!' then, and point four says one must
not use it for the non-UTF-8 encodings if the endianess is being declared. So
it seems inconsistent admitting it in UTF-8. Perhaps some legacy from those
stray editors.
Hans