emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dynamic loading progress


From: Eli Zaretskii
Subject: Re: Dynamic loading progress
Date: Sun, 22 Nov 2015 21:43:25 +0200

> From: Philipp Stephani <address@hidden>
> Date: Sun, 22 Nov 2015 19:10:44 +0000
> Cc: address@hidden, address@hidden, address@hidden
> 
>     It is only used in one place: the internal representation of
>     characters in buffers and strings. Emacs _never_ lets this internal
>     representation leak outside.
> 
> If I run in scratch:
> 
> (with-temp-buffer
> (insert #x3fff40)
> (describe-char (point-min)))

Emacs will never find such "byte" in any text.  So this feature is not
really relevant to the issue at hand.

> Then the resulting help buffer says "buffer code: #xF8 #x8F #xBF #xBD #x80", 
> is
> that not considered a leak?

No.  You created this yourself, and got what you asked for.

More generally, can you imagine a real-life situation where a string
with such "bytes" could be received from a module, as part of a C
'char *' string?

>     You are suggesting to expose the internal representation to outside
>     application code, which predictably will cause that representation to
>     leak into Lisp. That'd be a disaster. We had something like that
>     back in the Emacs 20 era, and it took many years to plug those leaks.
>     We would be making a grave mistake to go back there.
> 
> I don't suggest leaking anything what isn't already leaked. The extension of
> the codespace to 22 bits is well documented.

I don't think it's reasonable to request that module authors read all
that stuff and understand it, before they can write a simple module
that manipulates non-ASCII text.  Writing such modules should be that
hard.

> Returning raw bytes means that encoding and decoding isn't a perfect 
> roundtrip:
> 
> (decode-coding-string (encode-coding-string (string #x3fffc2 #x3fffbb)
> 'utf-8-unix) 'utf-8-unix)
> "ยป"

If you start with raw bytes, not large integers, then the roundtrip
will be perfect.

> What are the exact difference between the approaches? As far as I can see
> differences exist only for the following points:
> - Accepting invalid sequences. I consider that a bug in general-purpose APIs,
> including decode-coding-string. However, given that Emacs already extends the
> Unicode codespace and therefore has to accept some invalid sequences anyway, 
> it
> might be OK if it's clearly documented.
> - Emitting raw bytes instead of extended sequences. Though I'm not a fan of
> this it might be unavoidable to be able to treat strings transparently (which
> is desirable). 

Then I think we agree after all.

Thanks.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]