help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compiling Reverse Polish Calculator Example


From: Hans Aberg
Subject: Re: Compiling Reverse Polish Calculator Example
Date: Thu, 1 Nov 2012 20:54:26 +0100

On 1 Nov 2012, at 18:34, Akim Demaille wrote:

> Hi Hans,

Hi Akim,

> Le 31 oct. 2012 à 15:47, Hans Aberg a écrit :
> 
>> It is pointless in UTF-8, and accepting it encourages a number of other 
>> problems.
>> https://en.wikipedia.org/wiki/Byte_order_mark
> 
> You are right that Bison wants at least to be able to read
> the ASCII part of the 8 bits, so that sort-of means UTF-8,
> if we consider that Latin 1 and the like are dead.

Isn't Bison strictly speaking just reading 8-bit ASCII, forwarding whatever is 
in strings, in view of that not all byte (octet) sequences are UTF-8? Modulo 
diagnostics in foreign languages.

> If we were to ignore the BOM, then at least we should check
> that they match UTF-8, and reject the file otherwise?


Perhaps issuing a better error that this is BOM. It shows up (as mentioned in 
the URL above) for some stray editors that add it by default, so it might be 
better to prohibit it, to encourage user to set their editors right.

> FWIW, the D compilers for instance obey these BOM, including for
> other codings than UTF-8.

As for UTF-8 and POSIX/UNIX, I think its use is strongly discouraged, for 
various reasons. A philosophical one is that UTF-8 was originally invented to 
be a Unix encoding, which is largely handles text-streams, and a BOM is a 
contextual marker (as it must be first in the stream), that breaks that idea.

Hans





reply via email to

[Prev in Thread] Current Thread [Next in Thread]