Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

From:

Alex Shinn

Subject:

Date:

Wed, 16 Jan 2013 00:39:16 +0900

On Tue, Jan 15, 2013 at 7:48 PM, Peter Bex <address@hidden> wrote:

These special characters are called "reserved" in the BNF. As you can
see, the question mark, equals sign and ampersand is in there.
For query urlencoded query strings, these *cannot* be decoded, because
then you can't distinguish between

http://calc.example.com?bool-expr=x%26y%3D
and
http://calc.example.com?bool-expr=x&y=1

The former should be decoded in uri-common to the alist
((bool-expr . "x&y=1")) and the latter to ((bool-expr . "x") (y . "1")).
By fully decoding all reserved characters in uri-generic, you drop
important information.

The internal representation is either decoded, or it is encoded.

Either can be made to work.

In this case, the decoded uri-common representation of the former is:

((bool-expr . "x&y=1"))

and the decoded representation of the latter is:

((bool-expr . "x") (y . "1"))

just as you say, so this is how they are stored in the URI object.

In uri-generic, both get parsed to:

((bool-expr . "x&y=1"))

As the RFC states:

Because the percent ("%") character serves as the indicator for

percent-encoded octets, it must be percent-encoded as "%25" for that

octet to be used as data within a URI.

Therefore, if you intended the raw URI data to include a "%",

then the correct representation (for either common or generic)

would have been:

http://calc.example.com?bool-expr=x%2526y%253D

So assuming & is _not_ special to the query (as is the case

with uri-generic), escaping & with %25 or not produces the

same result.

Alex