chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.


From: Ivan Raikov
Subject: Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.
Date: Tue, 15 Jan 2013 13:22:30 +0900

Hi all,

   I realized that I replied only to Sungjin and neglected to include the mailing list, so let me repeat.

Section 3.1 of RFC 3987 defines a mapping between IRIs and URIs such that UTF-8 sequences are percent-encoded.
So I implemented a procedure iri->uri, which percent-encodes a UTF-8 string and passes it to the usual URI constructor in uri-generic.
It is intended to work as follows:

(iri->uri "http://example.com/삼계탕") =>
#(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/ "%EC%82%BC%EA%B3%84%ED%83%95") query=#f fragment=#f)

However, the uri-generic constructor tries to normalize all URIs by percent decoding them, so currently the URL above results in this:

#(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/ "�%82%BC�%B3%84�%83%95") query=#f fragment=#f)


  In other words, parts of the percent-encoded UTF-8 sequences are decoded back to unprintable ASCII characters.
So a better solution might indeed be to change iri->uri to pass the percent-encoded sequences directly to make-uri without attempts at percent-decoding normalization.

  Sungjin's modification to the definition of 'unstructured' is in line with the IRI RFC (except of course we will need to add all other character sets besides Hangul).
However, it was already pointed out by Peter and Alex that URIs containing native UTF-8 sequences might results in invalid URLs being sent to systems that do not understand IRIs or UTF-8.

I will modify iri->uri to avoid normalization and see if this would produce ok results.

  Ivan














On Tue, Jan 15, 2013 at 12:20 PM, Alex Shinn <address@hidden> wrote:
=삼계탕&start=0&rows=10



reply via email to

[Prev in Thread] Current Thread [Next in Thread]