emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dynamic loading progress


From: David Kastrup
Subject: Re: Dynamic loading progress
Date: Sun, 22 Nov 2015 20:12:10 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)

Philipp Stephani <address@hidden> writes:

> Eli Zaretskii <address@hidden> schrieb am So., 22. Nov. 2015 um 18:35 Uhr:
>
>> > From: Philipp Stephani <address@hidden>
>> > Date: Sun, 22 Nov 2015 09:25:08 +0000
>> > Cc: address@hidden, address@hidden,
>> address@hidden
>> >
>> >     > Fine with me, but how would we then represent Emacs strings that
>> are not
>> >     valid
>> >     > Unicode strings? Just raise an error?
>> >
>> >     No need to raise an error. Strings that are returned to modules
>> >     should be encoded into UTF-8. That encoding already takes care of
>> >     these situations: it either produces the UTF-8 encoding of the
>> >     equivalent Unicode characters, or outputs raw bytes.
>> >
>> > Then we should document such a situation and give module authors a way to
>> > detect them.
>>
>> I already suggested what we should say in the documentation: that
>> these interfaces accept and produce UTF-8 encoded non-ASCII text.
>>
>
> If the interface accepts UTF-8, then it must signal an error for invalid
> sequences; the Unicode standard mandates this.
> If the interface produces UTF-8, then it must only ever produce valid
> sequences, this is again required by the Unicode standard.

The Unicode standard does not mandate Emacs' internal string encodings.
EmacsĀ 22 had an entirely different internal string encoding that was
less convenient for a module interface.

Emacs' internal encoding has the property that valid UTF-8 sequences are
represented by themselves.

Which is a convenience to the programmer.  Not more, not less.

> That's why I propose to not encode raw bytes as bytes, but as the
> Emacs integer codes used to represent them.

So UCS-16 instead of UTF-8?  With conversions for every call?  That
sounds like shuffling the problem around, and shuffling does not come
for free.

It's ok to provide checking sequences that verify that something is a
valid internal Emacs string (which includes more than Unicode) and flag
an error if it isn't.  Also that something is a valid UTF-8 string if
that is desired (and which will in case of error optionally either
convert to the Emacs-internal representation or flag an error).

But making each call gate automatically verify every string for UTF-8
compliance is both wasteful as well as making it impossible to process
generic Emacs strings in an external module.

-- 
David Kastrup



reply via email to

[Prev in Thread] Current Thread [Next in Thread]