bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23750: 25.0.95; bug in url-retrieve or json.el


From: Dmitry Gutov
Subject: bug#23750: 25.0.95; bug in url-retrieve or json.el
Date: Mon, 20 Jun 2016 17:54:23 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2

On 06/20/2016 05:38 PM, Eli Zaretskii wrote:

> This all sounds like my response is not welcome, but in that case why
> did you ask the question?

I was kind of hoping for "yes, let's get it into 25.1!"? :)

No, the bug is where the invalid input is generated in the first
place.  Each API has its contract; if you violate the contract, you
invoke undefined behavior.

It's a bug in the API, or bad API, if you will. It needs stricter contract, and the submitted patch added it.

Or to look at it another way, the current contract allows url-http-data to be multibyte, because the requirement to the contrary is not documented anywhere that I can see. The variable is simply undocumented.

    If this is what you need, why not simply test the payload for being a
    unibyte string?  There a function, multibyte-string-p, for that.

There are a lot of variables to test (see the comment above the mapconcat call).

Looks like mapc will be able to deal with that.  Or just use concat,
and test the result with multibyte-string-p before sending.  Or encode
it with UTF-8, if it is not unibyte already.

I don't know if we want to be that permissive that we'll encode to UTF-8 silently.

Btw, I don't think the comment which explains why we started using
mapconcat is accurate these days.  It was written before the move to
Unicode in Emacs 23, but we stopped converting raw bytes into Latin-1
characters in Emacs 23 and later.  So maybe we should just go back to
using concat (with erroring out, if the result is multibyte, and/or
maybe with replacing 'length' with 'string-bytes').

Better error out: the payload's encoding is something only the caller should be concerned with. Unless we're fine with the users assuming that Emacs's internal encoding is close enough to UTF-8.

Bottom line: like I said, there should be no reason to use
string-*-unibyte in modern Emacs code on the url-http level or higher
(maybe not at all).  Its use is a sign of some basic misunderstanding,
or a bug elsewhere, or remnant of old problems that no longer exist.
So I think we should reconsider the solution on master as well.

I don't mind. Would you advocate for having this fix on emacs-25 if I implement it the way you described?

And you'll have to come up with the error message(s).

Are you saying you like the error message from string-to-unibyte?

  Cannot convert 123th character to unibyte

It's an order of magnitude better than what was before (no error and silent corruption), but yes, there is space for improvement.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]