help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: those funny non-ASCII characters


From: rusi
Subject: Re: those funny non-ASCII characters
Date: Fri, 1 Jun 2012 20:17:35 -0700 (PDT)
User-agent: G2/1.0

On Jun 2, 2:06 am, Xah Lee <xah...@gmail.com> wrote:
> Xah wrote
>
> > > 〈Unicode BOM Byte Order Mark 
> > > Hack〉http://xahlee.org/comp/unicode_BOM_byte_orde_mark.html
>
> > >http://www.unicode.org/faq/utf_bom.html#bom1
>
> On Jun 1, 9:26 am, rusi <rustompm...@gmail.com> wrote:
>
> > Seehttp://www.unicode.org/versions/Unicode5.0.0/ch02.pdf
> > (pg 36) "Use of a BOM is neither required nor recommended for UTF-8,
> > but may
> > be encountered in contexts where UTF-8 data is converted from other
> > encoding forms..."
>
> > More specifically the non-recommendation of 
> > bom:http://www.unicode.org/faq/utf_bom.html
> > "Note that some recipients of UTF-8 encoded data do not expect a BOM.
> > Where UTF-8 is used transparently in 8-bit environments, the use of a
> > BOM will interfere with any protocol or file format that expects
> > specific ASCII characters at the beginning, such as the use of "#!" of
> > at the beginning of Unix shell scripts. "
>
> didn't i mention these 2 points exactly in the link i gave??

Yeah your own link says this: (as you know I often use and quote your
unicode pages :-) )

- In unix-like OSes, BOM for utf-8 conflicts with the Shebang (Unix)
hack.
- Many Window software add BOM to utf-8 files, e.g. Notepad.

But you also say

> If your lang spec says unicode, you have to support BOM mark

So I am not clear whats ur stand...

Let me make my own position clear:
The de jure unicode standard is set by the unicode consortium (or
whatever its called)
The de facto standard is set by microsoft and java
The two conflict


reply via email to

[Prev in Thread] Current Thread [Next in Thread]