chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.


From: Peter Bex
Subject: Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.
Date: Mon, 14 Jan 2013 09:18:52 +0100
User-agent: Mutt/1.4.2.3i

On Mon, Jan 14, 2013 at 02:42:40PM +0900, Alex Shinn wrote:
> On Mon, Jan 14, 2013 at 1:36 PM, Sungjin Chun <address@hidden> wrote:
> > As far as I know, revised RFC permits UTF-8 characters in the URL without
> > encoding. Am I wrong here?
> 
> Thus you can't use raw non-ASCII bytes in a URI - they must
> be encoded, and interpretation is up to the origin (and is overwhelmingly
> utf8 these days).

Wow, thanks for doing the research!  I was a bit lazy in not doing
that in the first place.  It's not the first time though that people
think something's wrong in uri-generic whereas on closer reading of
the RFC it turns out to be correct :)

There is a very common misconception held by many programmers that you
only need to encode an URI whenever the link doesn't work in a browser.
However, this is a source of vulnerabilities and subtle bugs. 
A lot of browsers simply try to cope with broken HTML and even broken
URI strings, apparently.

> It would of course be possible for any tool or webserver to
> accept URIs with non-ASCII bytes, but I don't know of any
> browsers which would _send_ such a request, because in
> general it would be rejected.

We've decided to make uri-generic follow the RFC as closely as
possible.  To our knowledge, this library is the most RFC-compliant
URI library available for *any* language.

Cheers,
Peter
-- 
http://sjamaan.ath.cx



reply via email to

[Prev in Thread] Current Thread [Next in Thread]