chicken-janitors
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-janitors] #998: uri->string / make-uri path encoding incons


From: Chicken Trac
Subject: Re: [Chicken-janitors] #998: uri->string / make-uri path encoding inconsistencies
Date: Thu, 14 Mar 2013 21:12:33 -0000

#998: uri->string / make-uri path encoding inconsistencies
----------------------+-----------------------------------------------------
  Reporter:  andyjpb  |       Owner:  sjamaan 
      Type:  defect   |      Status:  accepted
  Priority:  major    |   Milestone:  someday 
 Component:  unknown  |     Version:  4.8.x   
Resolution:           |    Keywords:          
----------------------+-----------------------------------------------------

Comment(by sjamaan):

 I'm unsure but this appears to be correct.  The fact that the original
 string is read/write invariant is a feature specifically made so that non-
 HTTP URIs keep their exact encoding, which makes it easier for
 applications to extract the original "generic" URI from the object in
 unmodified form.

 When generating (or updating the component), these characters get encoded.
 It is *extremely* unclear from the spec what should happen in this case.

 According to RFC3986 (URI), section 2.2:
 "URI producing applications should percent-encode data octets that
  correspond to characters in the reserved set unless these characters
  are specifically allowed by the URI scheme to represent data in that
  component."

 and section 2.3:
 "URIs that differ in the replacement of an unreserved character with
  its  corresponding percent-encoded US-ASCII octet are equivalent:
  they identify the same resource."

 Coupled with RFC2616 (HTTP/1.1) section 3.2.3:

 "Characters other than those in the "reserved" and "unsafe" sets (see
  RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding."

 Besides the fact that "unsafe" is not even declared in that RFC (which is
 the 3986 predecessor), I interpret this to mean that special characters
 are to be treated as special, and implementations should be as
 conservative as possible, and percent-encode all these other characters.
 This means that "./5:123", "5%3A123" and "./5%3A123" are all distinct URIs
 which should be differentiated on the server-side.  There's no sane choice
 to be made except to just encode everything that isn't 100% safe.

 uri-generic, on the other hand, does _not_ encode anything except the
 slash, because it explicitly puts more control into the user's hands and
 allows the user to determine which of these three paths from "./5:123",
 "5%3A123" and "./5%3A123" he wants.  In that sense, uri-generic is more
 low-level and therefore allows more fine-grained control.

-- 
Ticket URL: <http://bugs.call-cc.org/ticket/998#comment:2>
Chicken Scheme <http://www.call-with-current-continuation.org/>
Chicken Scheme is a compiler for the Scheme programming language.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]