emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dynamic loading progress


From: Eli Zaretskii
Subject: Re: Dynamic loading progress
Date: Sat, 21 Nov 2015 13:10:13 +0200

> From: Philipp Stephani <address@hidden>
> Date: Sat, 21 Nov 2015 10:31:24 +0000
> Cc: address@hidden, address@hidden, address@hidden
> 
>     If you agree, then in both cases the strings these functions return
>     should be in the internal representation of strings used by Emacs, not
>     in some encoding like UTF-8 or ISO-8859-1. (We could also use encoded
>     strings, but that would require Lisp programs using module functions
>     to always decode any strings they receive, which is less efficient and
>     more error-prone.)
> 
> Yes. Just for understanding: there are two types of strings: unibyte (just a
> sequence of chars), and multibyte (sequence of chars interpreted in the
> internal Emacs encoding), right?

Yes.  However, unibyte strings are just streams of bytes; Emacs cannot
interpret them, and they generally appear on display as octal escapes.
They should never be presented to the user, except if the user
explicitly requested that, e.g. by a command such as
find-file-literally.

>     (Btw, I don't think we should worry about changing the internal
>     representation of characters in Emacs, because make_multibyte_string
>     will be updated as needed.)
> 
> This is a crucial point. If the internal encoding never changes, then we can
> declare that those string parameters are expected to be in the internal
> encoding.

No, we cannot, or rather should not.  It is unreasonable to expect
external modules to know the intricacies of the internal
representation.  Most Emacs hackers don't.

> But see the discussion in
> https://github.com/aaptel/emacs-dynamic-module/issues/37: the comment in
> mule-conf.el seems to indicate that the internal encoding is not stable.

That discussion is about zero-copy access to Emacs buffer text and
Emacs strings inside module code.  Such access is indeed impossible
without either knowing _something_ about the internal representation,
or having additional APIs in emacs-module.c that allow modules such
access while hiding the details of the internal representation.  We
could discuss extending the module functionality to include this.

But that is a separate issue from what module_make_function and
module_make_string do.  These two functions are basic, and don't need
to know about the internal representation or use it.  While direct
access to Emacs buffer text will be needed by only some modules,
module_make_function will be used by all of them, and
module_make_string by many.

So I think we shouldn't conflate these two issues; they are separate.

>     This is what my comments were about. I think that you, by contrast,
>     are talking about the encoding of the _input_ strings, in this case
>     the 'documentation' argument to module_make_function and 'str'
>     argument to module_make_string. My assumption was that these
>     arguments will always have to be in UTF-8 encoding; if that assumption
>     is true, then no decoding via code_convert_string_norecord is
>     necessary, since make_multibyte_string will DTRT. We can (and
>     probably should) document the fact that all non-ASCII strings must be
>     UTF-8 encoded as a requirement of the emacs-module interface.
> 
> Or rather, an extension to UTF-8 capable of encoding surrogate code points and
> numbers that are not code points, as described in
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html.

No, I meant strict UTF-8, not its Emacs extension.

>     If you are thinking about accepting strings encoded in other
>     encodings, I'd consider this an extension, to be added later if
>     needed. After all, a module can easily convert to UTF-8 by itself,
>     using facilities such as iconv.
> 
> Yes, provided the internal Emacs encoding is stable.

That's not what I meant.  (AFAIK, iconv doesn't know about the Emacs
internal representation.)  I meant that a module could convert from
any encoding to UTF-8, and then pass the resulting UTF-8 string to the
emacs-module API.

>     In any case, code_convert_string_norecord cannot be the complete
>     solution, because it accepts Lisp string objects, not C strings. You
>     still need to create a Lisp string (but this time using
>     make_unibyte_string). The point is to always use either
>     make_unibyte_string or make_multibyte_string, and never build_string
>     or make_string; the latter 2 should only be used for fixed ASCII-only
>     strings.
> 
> Yes, that's fine, the question is about whether the internal encoding is
> stable.

With my suggestion, the stability of the internal representation is
not an issue.

> If it's stable, we can use make_multibyte_string; if not, we can
> only use make_unibyte_string.

If the arguments strings are in strict UTF-8, then
make_multibyte_string will DTRT automagically, no matter what the
internal representation is.  That is their contract.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]