emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: emacs-25 b6b47AF: Properly encode/decode base64Binary data in SOAP


From: Thomas Fitzsimmons
Subject: Re: emacs-25 b6b47AF: Properly encode/decode base64Binary data in SOAP
Date: Sun, 13 Mar 2016 17:17:04 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux)

Eli Zaretskii <address@hidden> writes:

>> From: Thomas Fitzsimmons <address@hidden>
>> Cc: address@hidden,  address@hidden
>> Date: Sun, 13 Mar 2016 15:54:34 -0400
>> 
>> Eli Zaretskii <address@hidden> writes:
>> 
>> >> From: Thomas Fitzsimmons <address@hidden>
>> >> Cc: address@hidden,  address@hidden
>> >> Date: Sun, 13 Mar 2016 13:57:32 -0400
>> >> 
>> >>    (defun soap-parse-server-response ()
>> >>      "Error-check and parse the XML contents of the current buffer."
>> >>      (let ((mime-part (mm-dissect-buffer t t)))
>> >>        (unless mime-part
>> >>          (error "Failed to decode response from server"))
>> >>        (unless (equal (car (mm-handle-type mime-part)) "text/xml")
>> >>          (error "Server response is not an XML document"))
>> >>        (with-temp-buffer
>> >>          (mm-insert-part mime-part)
>> >>          (prog1
>> >>              (car (xml-parse-region (point-min) (point-max)))
>> >>            (kill-buffer)
>> >>            (mm-destroy-part mime-part)))))
>> >> 
>> >> mm-insert-part does:
>> >> 
>> >>    (string-to-multibyte (mm-get-part handle no-cache))
>> >
>> > Why does it do that?  string-to-multibyte is one of those functions
>> > that should never be used.
>> 
>> I don't know.  This is the first I've looked at the mm code.  I'll have
>> to do more investigation here, apparently.
>
> IME, mm-decode has a lot of baggage from distant past, when it had to
> deal with bugs in Emacs, with incompatibilities between Emacs and
> XEmacs, and from its own misconceptions in the area of encoding and
> decoding text.  Its code should be carefully reviewed for correctness.

OK.

>> Upthread I was saying only that xsd:base64Binary values should be
>> returned undecoded.
>
> But it currently doesn't, AFAIU, since this:
>
>   (cdr (assq 'severity (car (debbugs-get-status 22285))))
>
> is returned as a multibyte string.

"severity" is an xsd:string.

> So I guess you are saying that soap-client needs some changes to
> return xsd:base64Binary data as unibyte strings?

Yes, because Andreas's patch changed the behavior.  But the "severity"
example doesn't demonstrate this.  This does, because originator is an
xsd:base64Binary:

(cdr (assq 'originator (car (debbugs-get-status 22285))))

Before Andreas's patch that was unibyte, now, with Andreas's patch, it's
multibyte.

> Or is "severity" an xsd:string?

Yes.

>> I wasn't commenting on how other XSD string values (xsd:string, etc.)
>> should be returned.
>
> In general, if it's known to be a text string, and its encoding is
> specified by the document, or can be deduced otherwise with a 100%
> certainty, I'd recommend to return decoded strings.

OK.  That's what soap-client currently tries to do.  But as you've
pointed out, we need to review what it does for correctness.

>> Maybe I can give an example with XML fragments returned by the server,
>> to show how I think soap-client should handle xsd:base64Binary values.
>> 
>> The debbugs server will respond with:
>> 
>> <?xml version="1.0" encoding="UTF-8"?>
>> [...]
>> <severity xsi:type="xsd:string">normal</severity>
>> [...]
>> <originator 
>> xsi:type="xsd:base64Binary">Q2zDqW1lbnQgUGl0LS1DbGF1ZGVsIDxjbGVtZW50LnBpdGNsYXVkZWxAbGl2ZS5jb20+</originator>
>> [...]
>> 
>> soap-client will parse those results into a structure that it returns to
>> the caller:
>> 
>> ([...]
>>  (severity . "<string1>")
>>  [...]
>>  (originator . "<string2>")
>>  [...])
>> 
>> I think <string2> should be unibyte, because xsd:base64Binary represents
>> binary data, not necessarily a string.
>
> Btw, why is "originator" not a string? why xsd:base64Binary?  It's a
> name of a human (or some other entity), so it's clearly text, no?

Good question, I don't know.  Maybe Michael could comment here, since
this was a Debbugs decision.

>> What <string1> should be (unibyte or multibyte) and how it should be
>> produced (decoded) is the broader discussion.  I don't know enough to
>> have an opinion on that yet, other than it seems to have been working to
>> treat it as multibyte up until now.  Again, I'll have to talk to Alex
>> about this.
>
> If you can reliably decode it, then multibyte and decoded is better.
> I'd also say that if it's known that xsd:base64Binary is a string in
> disguise, it should also be decoded.

OK.  But there is no way for soap-client itself to know what the content
of an xsd:base64Binary value is.  The caller of soap-client will know
what it is, and so it should be up to the caller to interpret the bytes.
That's what Debbugs does with "originator"; it decodes them assuming
they represent a UTF-8 string.

> IOW, whenever the data is a string, it is better to decode it, I agree
> with Andreas here: unibyte strings that represent text are a PITA
> without a good justification.

I agree with you and Andreas in general.  However an xsd:base64Binary
value is a special case because it does not always represent text.
Therefore, I disagree with Andreas's patch; I've been trying to provide
good justification for not assuming that xsd:base64Binary values
represent text, because they don't always represent text, and there's no
way for soap-client to know when they do.

Thomas



reply via email to

[Prev in Thread] Current Thread [Next in Thread]