[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released)
From: |
Tim Rühsen |
Subject: |
Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released) |
Date: |
Mon, 14 Dec 2015 20:22:41 +0100 |
User-agent: |
KMail/4.14.10 (Linux/4.2.0-1-amd64; KDE/4.14.14; x86_64; ; ) |
Am Montag, 14. Dezember 2015, 18:33:38 schrieb Eli Zaretskii:
> > Date: Sun, 13 Dec 2015 20:04:31 +0100
> > From: "Andries E. Brouwer" <address@hidden>
> > Cc: "Andries E. Brouwer" <address@hidden>, address@hidden
> >
> > On Sun, Dec 13, 2015 at 08:01:27PM +0200, Eli Zaretskii wrote:
> > > If no one is going to pick up the gauntlet, I will sit down and do it
> > > myself, although I'm terribly busy with Emacs 25.1 release.
> >
> > Good!
>
> While working on this, I bumped into 2 related issues:
>
> 1. The functions that call 'iconv' (in iri.c) don't make a point of
> flushing the last portion of the converted URL after 'iconv'
> returns successfully having converted the input string in its
> entirety. IME, you need then to call 'iconv' one last time with
> either the 2nd or the 3rd argument set to NULL, otherwise
> sometimes the last converted character doesn't get output. In my
> case, some URLs converted from CP1255 to UTF-8 lost their last
> character. It sounds like no one has actually used this
> conversion in iri.c, except for trivially converting UTF-8 to
> itself. Is that possible/reasonable?
Possibly.
Could you please give an example string ? I would like to test it on
GNU/Linux, BSD and Solaris to see if the output is always the same.
> 2. Wget assumes that the URL given on its command line is encoded in
> the locale's encoding. This is a good assumption when the user
> herself types the URL at the shell prompt, but not when the URL is
> copy-pasted from a browser's address bar. In the latter case, the
> URL tends to be in UTF-8 (sometimes hex-encoded). At least that's
> what I get from Firefox. We don't seem to have in wget any
> facilities to specify a separate (3rd) encoding for the URLs on
> the command line, do we?
I stumbled upon this a while ago when thinking about the design of wget2. And
wget2 already has a working --input-encoding option for such cases.
AFAIK, nobody asked for such an option during the last years - so I assume
this to be a somewhat 'expert' or 'fancy' option, at least a low priority one.
It is an optional goodie.
Tim
- Re: [Bug-wget] GNU wget 1.17.1 released, (continued)
- Re: [Bug-wget] GNU wget 1.17.1 released, Andries E. Brouwer, 2015/12/11
- Re: [Bug-wget] GNU wget 1.17.1 released, Ander Juaristi, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Tim Rühsen, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Eli Zaretskii, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Tim Rühsen, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Eli Zaretskii, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Andries E. Brouwer, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Eli Zaretskii, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Andries E. Brouwer, 2015/12/13
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released),
Tim Rühsen <=
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Rühsen, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/17
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Andries E. Brouwer, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] Support non-ASCII URLs (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/15