[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lynx-dev] Unicode-marking, &c
From: |
Thorsten Glaser |
Subject: |
Re: [Lynx-dev] Unicode-marking, &c |
Date: |
Fri, 27 Feb 2009 09:24:00 +0000 (UTC) |
David Woolley dixit:
>> Here under Windows there are constant references to the character that
>> begins a 16-bit-wide-character file (FF FE) or UTF-8 file (EF BB BF).
>
> These are all valid printable characters in ISO 8859/x. Although somewhat
> unlikely combinations, they are not reserved sequences.
We are talking about a file that does _begin_ with these byte sequences
here, not a file that solely consists of them.
For UCS-* the things are quite clear, you get <\0h\0t\0m\0l\0> so it
obviously is not any 8-bit encoding.
For UTF-8, it’s not that easy, but:
• If the file is UTF-8 and uses any nōn-ASCII characters, it almost
always will contain an octet from the [0x80‥0x9F] range, which
practically rules it out from being encoded as latin1
• In case of doubt: If the file contains only valid UTF-8 with no
encoding errors (invalid multibyte sequences), lean towards it,
as it’s the current standard replacing the 8-bit character sets
• If the file only contains ASCII characters, while point #1 above
is no longer valid, the difference is moot anyway
bye,
//mirabilos
--
“It is inappropriate to require that a time represented as
seconds since the Epoch precisely represent the number of
seconds between the referenced time and the Epoch.”
-- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2
- Re: [Lynx-dev] Unicode-marking, &c, (continued)
- Re: [Lynx-dev] Unicode-marking, &c, Thorsten Glaser, 2009/02/27
- Re: [Lynx-dev] Unicode-marking, &c, Halsz Sndor, 2009/02/26
- Re: [Lynx-dev] Unicode-marking, &c, Thorsten Glaser, 2009/02/27
- Message not available
- Re: [Lynx-dev] Unicode-marking, &c, Thomas Dickey, 2009/02/26
- Re: [Lynx-dev] Unicode-marking, &c, Thorsten Glaser, 2009/02/27
- Re: [Lynx-dev] Unicode-marking, &c, Thomas Dickey, 2009/02/27
Re: [Lynx-dev] Unicode-marking, &c, David Woolley, 2009/02/27
- Re: [Lynx-dev] Unicode-marking, &c,
Thorsten Glaser <=