[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Running script from directory with UTF-8 characters
From: |
David Kastrup |
Subject: |
Re: Running script from directory with UTF-8 characters |
Date: |
Wed, 23 Dec 2015 22:53:14 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) |
Eli Zaretskii <address@hidden> writes:
> From: Marko Rauhamaa <address@hidden>
>
>> Why don't you tell me already what emacs does?
>
> I did, you elided that. It represents text as superset of UTF-8, and
> uses high codepoints above the Unicode space for raw bytes.
Incorrect. It uses overlong encodings of 0x00-0x7f for raw bytes in the
0x80-0xff range (0x00-0x7f are always represented as themselves). Those
are not allowed in properly encoded UTF-8 and take only two bytes (byte
patterns 0xc0 0x80–0xbf and 0xc1 0x80–0xbf), so random byte patterns get
inflated by somewhat less than 50% on average (every pattern allowed in
properly encoded UTF-8 is left unchanged, of course).
That's more economical than Python's method which uses the encodings of
surrogate words not allowed in properly encoded UTF-8, taking 3 bytes
rather than the 2 Emacs makes do with. Using high codepoints above the
Unicode space would even take 4 bytes.
--
David Kastrup
- Re: Running script from directory with UTF-8 characters, (continued)
- Re: Running script from directory with UTF-8 characters, Chris Vine, 2015/12/22
- Re: Running script from directory with UTF-8 characters, Marko Rauhamaa, 2015/12/22
- Re: Running script from directory with UTF-8 characters, Chris Vine, 2015/12/22
- Re: Running script from directory with UTF-8 characters, Marko Rauhamaa, 2015/12/22
- Re: Running script from directory with UTF-8 characters, Eli Zaretskii, 2015/12/22
- Re: Running script from directory with UTF-8 characters, Marko Rauhamaa, 2015/12/22
- Re: Running script from directory with UTF-8 characters, Eli Zaretskii, 2015/12/23
- Re: Running script from directory with UTF-8 characters, Marko Rauhamaa, 2015/12/23
- Re: Running script from directory with UTF-8 characters, Eli Zaretskii, 2015/12/23
- Re: Running script from directory with UTF-8 characters, Marko Rauhamaa, 2015/12/23
- Re: Running script from directory with UTF-8 characters,
David Kastrup <=
- Re: Running script from directory with UTF-8 characters, Marko Rauhamaa, 2015/12/23
- Re: Running script from directory with UTF-8 characters, David Kastrup, 2015/12/23
- Re: Running script from directory with UTF-8 characters, Barry Schwartz, 2015/12/24
Re: Running script from directory with UTF-8 characters, Vicente Vera, 2015/12/22