bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Shouldn't wget strip leading spaces from a URL?


From: Tim Rühsen
Subject: Re: [Bug-wget] Shouldn't wget strip leading spaces from a URL?
Date: Wed, 14 Jun 2017 21:54:07 +0200
User-agent: KMail/5.2.3 (Linux/4.9.0-3-amd64; KDE/5.28.0; x86_64; ; )

On Mittwoch, 14. Juni 2017 11:49:59 CEST L A Walsh wrote:
> Dale R. Worley wrote:
> >  But of course, no [RFC3986-conforming] URL
> >  contains an embedded space because that's what it
> >  says in RFC 3986, which is "what *defines* what a
> >  URL *is*"[sic; should read "is one definition of
> 
> a URL.
> ---
>     Right, just like speed limit signs define
> what the maximum speed is.
> 
> There is the "model" and there is reality.  To believe that
> the model replaces and/or dictates reality is not
> realistic and bordering on some mental pathology.
> 
> I understand what you are saying Dale.  My dad was a lawyer,
> and life would be so much easier if specs, RFCs or other
> models of reality were the only thing we had to pay attention
> to.  But... to do so generally creates various levels of
> discomfort and/or headaches.
> 
> >  Now, someone can provide a string that contains spaces and claim
> >  it's a URL, but it isn't. The question is, What to do with it?  My
> >  preference is to barf and tell the user that what they provided
> >  wasn't a proper URL.
> 
> ---
>     I.e.: not doing what you can to give them some output
> that is your _best_ _attempt_ to give them what they wanted
> (excluding dangerous interpretations).
> 
>     A friendly user-interface attempts to help the user get
> what they want despite their not asking for it according to
> regulation or with poor syntax or spelling.
> 
> >  Beyond that, one might do some simple tidying up, such as removing
> >  leading and trailing spaces.  That fix, by the way, is known to be
> >  safe, *because a URL can't contain a space*, and so any trailing
> >  space can't actually be part of the URL.
> 
> ----
>     One might argue that leading and trailing space, since they
> are not "internal" to the URL, aren't really a part of the URL.
> 
> >  It gets uglier when there are invalid characters in the middle of
> >  the URL, because simply deleting them is unlikely to produce the
> >  results the user expected.
> 
> ---
>     Yup.  Thus my original post thinking that they should be
> removed since they can't really be part of a URL and as "characters
> non gratis", should be removed before sending them to a remote
> website.

Just in short, there are two 'realities' here
1. The RFC which defines a (part of a) protocol between client and server. 
Clients and servers have to follow this standard, if they deviate they are 
out. This is 'reality' one.

2. User input... well, every (web) client does interpret user input 
differently. But every client tries hard to 'WYGIWYM' (What You Get Is What 
You Mean).
Basically, the problem is solved (or should be) by browsers, so why not do as 
they do ? Well, we can do it similarly but should not forget that 'wget' is a 
'power user' tool while a browser is used by everyone.
People use 'wget' also for very special tasks, e.g. downloading a file which 
name consists of a simple space. Wget would become useless for these people 
(count myself in here) if they couldn't -comfortable- enter a URL with a 
trailing space (wget knows how to escape that, following the RFC).

Example:
  wget 'https://example.com/ '
Should wget download download this space named file or (silently) strip the 
space and download index.html ?
Two answers here, which one has more weight ? Maybe the one that pertains 
disturb backward compatibility !?

> 
> -linda

With Best Regards, Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]