[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #47701] wget 1.17.1 fails to convert from percent encodi
From: |
anonymous |
Subject: |
[Bug-wget] [bug #47701] wget 1.17.1 fails to convert from percent encoding to unicode correctly (mingw32) |
Date: |
Fri, 15 Apr 2016 04:31:09 +0000 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36 |
URL:
<http://savannah.gnu.org/bugs/?47701>
Summary: wget 1.17.1 fails to convert from percent encoding
to unicode correctly (mingw32)
Project: GNU Wget
Submitted by: None
Submitted on: Fri 15 Apr 2016 04:31:08 AM UTC
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name: Anonymous Coward
Originator Email:
Open/Closed: Open
Discussion Lock: Any
Release: 1.17.1
Operating System: Microsoft Windows
Reproducibility: None
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Details:
The version of GNU Wget you were using
1.17.1 (mingw32)
How you invoked wget
wget -d -r -np -nH --cut-dirs=4
"https://leoandpeto.com/Music/Peto/Non/o-z/R%C3%B6yksopp%20-%20The%20Understanding/"
(in case this is some sort of mingw32-compiler bug)
I obtained my binary copy of mingw32 wget from
https://eternallybored.org/misc/wget/ as recommended by
http://wget.addictivecode.org/FrequentlyAskedQuestions#download
What you expected wget to do
Recursively download the requested open directory.
What wget did (include output messages).
didn't download it.
I tried running with debug on, from looking at the
output it seems wget converted from the percent-
encoded URL to w32's native unicode encoding wrong.
(full copy (input and output) of my run of wget is
included as an attachment named "wget-output.txt".)
specifically, the output includes this line (exactly.)
converted
'https://leoandpeto.com/Music/Peto/Non/o-z/R%C3%B6yksopp%20-%20The%20Understanding/'
(ASCII) -> 'https://leoandpeto.com/Music/Peto/Non/o-z/RC6yksopp - The
Understanding/' (UTF-8)
wget changed the word from "R%C3%B6yksopp" to "RC6yksopp"
with no percent signs.
It seems to be stripping the leading first bit off of
each byte of %C3 (11000011) and %B6 (10110110), and so
converting them into their 7-bit ASCII equivalents:
,-----.----------.-------.
| HEX | BIN | ASCII |
+-----+----------+-------+
| C3 | 11000011 | -- |
| 43 | 1000011 | C |
+-----+----------+-------+
| B6 | 10110110 | -- |
| 36 | 0110110 | 6 |
`-----'----------'-------'
Also, I'm not 100% sure that this isn't a duplicate of
http://savannah.gnu.org/bugs/index.php?47689
but I figured it's best to let you developers decide
rather than failing to file a bug-report.
Thank you for making/working on wget, and
please CONTINUE BEING AWESOME! :-D
_______________________________________________________
File Attachments:
-------------------------------------------------------
Date: Fri 15 Apr 2016 04:31:08 AM UTC Name: wget-output.txt Size: 5kB By:
None
<http://savannah.gnu.org/bugs/download.php?file_id=36931>
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?47701>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
- [Bug-wget] [bug #47701] wget 1.17.1 fails to convert from percent encoding to unicode correctly (mingw32),
anonymous <=