[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986
From: |
Maxime Devos |
Subject: |
Re: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2. |
Date: |
Mon, 25 Sep 2023 22:46:15 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 |
Op 25-09-2023 om 18:48 schreef Vivien Kraus:
* module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4.
(merge-paths): Implement algorithm 5.2.3.
(resolve-relative-reference): Implement algorithm 5.2.2.
(module): Export resolve-relative-reference.
* NEWS: Reference it here.
---
Dear Guile developers,
When you request https://example.com/resource an URI and get redirected
to "here", you end up with 2 URI references:
- https://example.com/resource
- here
What should you request next? The answer is,
"https://example.com/here". It seems evident how we go from one to the
other.
However, there are more subtle cases. What if you get redirected to
"../here", for instance?
RFC 3986 has you covered, in section 5.2. It explains how we go from a
base URI and a URI reference to the new URI.
What do you think?
Best regards,
Sounds very useful. However, there are also some dangers on doing this
thing -- the ‘external’ page https://example.com/data.json could
redirect to
http://localhost/unsecured-secret-but-its-localhost-only-so-it-is-safe.
I forgot the name of this attack, but there is probably a page somewhere
that documents the danger and how to mitigate it (I think I read some
Firefox documentation somewhere?). Perhaps there exists an informative
RFC about it? I think it put a warning about this somewhere in the
documentation.
(Another related problem is that example.com could have IP address ::1,
but that's a different problem.)
Vivien
NEWS | 7 ++
module/web/uri.scm | 152 +++++++++++++++++++++++++++++++++-
test-suite/tests/web-uri.test | 68 +++++++++++++++
You forgot modifying the non-docstring documentation to properly
document the new procedure.
3 files changed, 226 insertions(+), 1 deletion(-)
Given the existence of resolve-relative-reference, it is easy to expect
http-request to do redirection. I think it would be best to adjust to
the documentation of http-request / http-get / ... to mention whether
automatic redirection is done or not.
+(define (resolve-relative-reference base relative)
+ "Resolve @var{relative} on top of @var{base}, as RFC3986, section 5.2."
I don't like the mutation, but it's a completely deterministic procedure
(ignoring memory allocation) so it can't cause problems and hence I
suppose it's not worth rewriting (unless you or someone else really
wants to rewrite it, I suppose).
I suppose it also avoids the risk of accidentally deviating from the RFC
it is supposed to implement, which is major advantage of sticking with
the mutation.
I like that you say _which_ resolution method you are using instead of
saying or implying that this is the always the _right_ way of resolving
relative references, because some URI schemes are rather quirky.
(I don't know any quirkiness w.r.t. relative references, but wouldn't be
surprised if it exists.)
Also I think it's worth stating that both base and relative are URIs --
with the current docstring, I find (resolve-... uri "./whatever") a
reasonable thing to do.
IIUC, there currently is nothing preventing
(resolve-... (uri object for "https://example.com/a")
(uri object for "https://example.com/b")).
IIUC, this is supposed to return (uri object for
"https://example.com/b"), but that could be more explicit with a change
of variable name.
(define (resolve-... base maybe-relative)
[...])
+(with-test-prefix "resolve relative reference"
+ ;; Test suite in RFC3986, section 5.4.
+ (let ((base (string->uri "http://a/b/c/d;p?q"))
+ (equal/encoded?
+ ;; The test suite checks for ';' characters, but Guile escapes
+ ;; them in URIs. Same for '='.
IIUC, that's a bug!
6.2.2.2. Percent-Encoding Normalization
The percent-encoding mechanism (Section 2.1) is a frequent source of
variance among otherwise identical URIs. In addition to the case
normalization issue noted above, some URI producers percent-encode
octets that do not require percent-encoding, resulting in URIs that
are equivalent to their non-encoded counterparts. __These URIs
__should be normalized by decoding any percent-encoded octet that
corresponds to an unreserved character, as described in
Section 2.3.__
(Emphasis added.)
I am assuming here that ; is an unreserved character, if it isn't, there
isn't a bug here.
However, I sense a lack of normalisation in resolve-relative-reference,
so unless Guile already does normalisation elsewhere (perhaps it does
during the URI object construction?), there might be a bug here -- ok,
technically perhaps not a bug because the docstring only mentions
‘implements section 5.2 and 5.2. doesn't seem to mention section 6 (and
section 6 says ‘should’, not ‘shall/must’, but some people use ‘should’
as ‘shall/must’, so dunno), but in that case that's still a potential
pitfall that could be mentioned in the docstring. Could be as simple as
"No normalisation is performed.".
I guess it shouldn't do normalisation, but guesswork seems better to be
avoided/confirmed or disconfirmed when possible.
Best regards,
Maxime Devos.
OpenPGP_0x49E3EE22191725EE.asc
Description: OpenPGP public key
OpenPGP_signature
Description: OpenPGP digital signature