[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released)
From: |
Tim Ruehsen |
Subject: |
Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released) |
Date: |
Tue, 15 Dec 2015 11:02:21 +0100 |
User-agent: |
KMail/4.14.10 (Linux/4.3.0-1-amd64; KDE/4.14.14; x86_64; ; ) |
I pushed a conversion fix to master.
There is another bug in wget that comes out with
wget -d --local-encoding=cp1255
'http://he.wikipedia.org/wiki/%F9._%F9%F4%F8%E4'
Wget double escapes/converts to UTF-8... Maybe you can address this when you
are working on the code !?
Tim
On Tuesday 15 December 2015 10:33:10 Tim Ruehsen wrote:
> On Monday 14 December 2015 18:33:38 Eli Zaretskii wrote:
> > > Date: Sun, 13 Dec 2015 20:04:31 +0100
> > > From: "Andries E. Brouwer" <address@hidden>
> > > Cc: "Andries E. Brouwer" <address@hidden>, address@hidden
> > >
> > > On Sun, Dec 13, 2015 at 08:01:27PM +0200, Eli Zaretskii wrote:
> > > > If no one is going to pick up the gauntlet, I will sit down and do it
> > > > myself, although I'm terribly busy with Emacs 25.1 release.
> > >
> > > Good!
> >
> > While working on this, I bumped into 2 related issues:
> > 1. The functions that call 'iconv' (in iri.c) don't make a point of
> >
> > flushing the last portion of the converted URL after 'iconv'
> > returns successfully having converted the input string in its
> > entirety. IME, you need then to call 'iconv' one last time with
> > either the 2nd or the 3rd argument set to NULL, otherwise
> > sometimes the last converted character doesn't get output. In my
> > case, some URLs converted from CP1255 to UTF-8 lost their last
> > character. It sounds like no one has actually used this
> > conversion in iri.c, except for trivially converting UTF-8 to
> > itself. Is that possible/reasonable?
>
> You are absolutely right.
>
> Attached is a small test C code that shows (and fixes) the problem.
>
> Regards, Tim
- Re: [Bug-wget] GNU wget 1.17.1 released, (continued)
- Re: [Bug-wget] GNU wget 1.17.1 released, Eli Zaretskii, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Andries E. Brouwer, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Eli Zaretskii, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Andries E. Brouwer, 2015/12/13
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Rühsen, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Rühsen, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released),
Tim Ruehsen <=
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/17
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Andries E. Brouwer, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] Support non-ASCII URLs (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/15
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/15
- Re: [Bug-wget] Support non-ASCII URLs, Giuseppe Scrivano, 2015/12/16
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/16
- Re: [Bug-wget] Support non-ASCII URLs, Tim Ruehsen, 2015/12/17
- Re: [Bug-wget] Support non-ASCII URLs, Giuseppe Scrivano, 2015/12/17