[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.
From: |
.alyn.post. |
Subject: |
Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri. |
Date: |
Mon, 14 Jan 2013 14:40:15 -0700 |
On Mon, Jan 14, 2013 at 09:18:52AM +0100, Peter Bex wrote:
> On Mon, Jan 14, 2013 at 02:42:40PM +0900, Alex Shinn wrote:
> > On Mon, Jan 14, 2013 at 1:36 PM, Sungjin Chun <address@hidden> wrote:
> > > As far as I know, revised RFC permits UTF-8 characters in the URL without
> > > encoding. Am I wrong here?
> >
> > Thus you can't use raw non-ASCII bytes in a URI - they must
> > be encoded, and interpretation is up to the origin (and is overwhelmingly
> > utf8 these days).
>
> Wow, thanks for doing the research! I was a bit lazy in not doing
> that in the first place. It's not the first time though that people
> think something's wrong in uri-generic whereas on closer reading of
> the RFC it turns out to be correct :)
>
> There is a very common misconception held by many programmers that you
> only need to encode an URI whenever the link doesn't work in a browser.
> However, this is a source of vulnerabilities and subtle bugs.
> A lot of browsers simply try to cope with broken HTML and even broken
> URI strings, apparently.
>
> > It would of course be possible for any tool or webserver to
> > accept URIs with non-ASCII bytes, but I don't know of any
> > browsers which would _send_ such a request, because in
> > general it would be rejected.
>
> We've decided to make uri-generic follow the RFC as closely as
> possible. To our knowledge, this library is the most RFC-compliant
> URI library available for *any* language.
>
I worked on an FTP program years ago that operated in an ecosystem
where lots of technically incorrect URLs were pasted around, and
we got a bug report that they weren't working in our client.
To 'fix' it, we had to remove support for correct URLs to handle
this more common use case. I regret having to do that to this day, so
thank you very much for RFC-compliant parsing.
-Alan
--
my personal website: http://c0redump.org/
- [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Sungjin Chun, 2013/01/13
- Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Peter Bex, 2013/01/13
- Message not available
- Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Sungjin Chun, 2013/01/14
- Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Alex Shinn, 2013/01/14
- Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Sungjin Chun, 2013/01/14
- Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Alex Shinn, 2013/01/14
- Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Ivan Raikov, 2013/01/14
- Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Ivan Raikov, 2013/01/15
- Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Alex Shinn, 2013/01/15
- Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Ivan Raikov, 2013/01/15
- Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Ivan Raikov, 2013/01/15
- Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri., Peter Bex, 2013/01/15