bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] URL normalisation: consecutive forward slashes


From: Cillian Sharkey
Subject: Re: [Bug-wget] URL normalisation: consecutive forward slashes
Date: Thu, 10 Jun 2010 15:41:10 +0100
User-agent: mutt-ng/devel-r804 (Linux)

Hi,

On Thu, 03 Jun 2010 11:51:58 -0700, Micah Cowan wrote:
> FWIW, I think RFC 3986 is the current authority on this; but I didn't
> see anything there either.
> 
> On 06/03/2010 05:32 AM, Giuseppe Scrivano wrote:
> > thanks for your report.  I am not sure that the URL normalisation should
> > collapse multiple consecutive forward slashes, I don't see anything
> > about it in RFC 1808.  We can't assume that "foo//bar" is the same as
> > "foo/bar", it could be handled differently by the server, for example it
> > may be part of PATH_INFO.
> > 
> > AFAICS, Firefox and Chromium don't normalize consecutive forward slashes
> > too.

You're right - the RFC allows for null path segments ("//") and it's
possible there may exist applications which assign meaning to these.

However, web crawlers use normalisation to prevent loops and unnecessary
duplication :-

http://en.wikipedia.org/wiki/URL_normalization

Furthermore, when wget saves a URL locally, it is translating the path
component to a filesystem path. Duplicate slashes are ignored under
POSIX :-

http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html#tag_03_266

As such, if there was an application that treated /foo//bar.html and
/foo/bar.html differently and returned separate content, they would both
be saved to the same local file by wget.

Perhaps the removal of duplicate slashes could be controlled with the
addition of a new wget option?

Regards,

-- 
Cillian Sharkey             Managed Network Services
t: +353-1-660-9040          HEAnet Limited - http://www.heanet.ie/
f: +353-1-660-3666          5 George's Dock, I.F.S.C., Dublin 1.
PGP: E1B98B66               Registered in Ireland, no. 275301



reply via email to

[Prev in Thread] Current Thread [Next in Thread]