[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Support non-ASCII URLs
From: |
Eli Zaretskii |
Subject: |
Re: [Bug-wget] Support non-ASCII URLs |
Date: |
Sun, 20 Dec 2015 19:23:05 +0200 |
> From: Tim Rühsen <address@hidden>
> Date: Sun, 20 Dec 2015 16:26:20 +0100
>
> > Tim sent me the tarball and the log off-list (thanks!). I didn't yet
> > try to build Wget, but just looking at the test, I guess I don't
> > understand its idea. It has an index.html page that's encoded in
> > ISO-8859-15, but Wget is invoked with --remote-encoding=iso-8859-1,
> > and the URLs themselves in "my %urls" are all encoded in UTF-8. How's
> > this supposed to work?
>
> Regarding the wget man page, --remote-encoding just sets the *default* server
> encoding. This only comes into play when the HTTP header does not contain a
> Content-type with charset set *and* the HTML page does not contain a <meta
> http-equiv="Content-Type" with 'content=... charset=...'.
Makes sense.
> 'index.html' in this test is correctly having a meta tag with charset=utf-8
> and the URLs encoded in utf-8.
That's not what I see: index.html says
"Content-type" => "text/html; charset=ISO-8859-15"
and its contents indeed has URLs encoded in ISO-8859-15.
> > Also, I'm not following the logic of overriding Content-type by the
> > remote encoding: p1_fran%C3%A7ais.html states "charset=UTF-8", but
> > includes a link encoded in ISO-8859-1, and the test seems to expect
> > Wget to use the remote encoding in preference to what "charset=" says.
>
> Either the test is wrong here or the man page. I would say the man page
> should
> be correct here - it makes the most sense to me. In this case the test is
> wrong, also the comment.
OK.
> > Does the remote encoding override the encoding for the _contents_ of
> > the URL, not just for the URL itself? That seems to make little sense
> > to me: the contents and the name can legitimately be encoded
> > differently, I think.
>
> The filenames in %expected_downloaded_files depend on --local-encoding.
> Since this is not given on the command line, this test will behave
> differently
> with different settings for LC_ALL ('make check' use LC_ALL=C, contrib/check-
> hard will also 'make check' with turkish UTF-8 locale).
>
> To fix the test, we should use --local-encoding to some kind of UTF-8 locale
> (or something else, but than we have to fix the filenames regarding that
> locale).
But then what would be the point of repeating the test with the
turkish locale? verify that when given --local-encoding the locale is
ignored?
- Re: [Bug-wget] Support non-ASCII URLs, (continued)
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/17
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/17
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/17
- Re: [Bug-wget] Support non-ASCII URLs, Giuseppe Scrivano, 2015/12/18
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/18
- Re: [Bug-wget] Support non-ASCII URLs, Giuseppe Scrivano, 2015/12/18
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/18
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/19
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/19
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/20
- Re: [Bug-wget] Support non-ASCII URLs,
Eli Zaretskii <=
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/20
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/20
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/20
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/20
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/20