bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Save 3 byte utf8 url


From: L Walsh
Subject: Re: [Bug-wget] Save 3 byte utf8 url
Date: Fri, 15 Feb 2013 17:50:15 -0800
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.24) Gecko/20100228 Lightning/0.9 Thunderbird/2.0.0.24 Mnenhy/0.7.6.666




Ángel González wrote:
On 07/02/13 15:06, bes wrote:
Hi,

i found some bug in wget with interpreting and save percent-encoding 3 byte
utf8 url

example:
1. Create url with "—". This is U+2014 (EM DASH). Percent-encoding UTF-8 is
"%E2%80%94"
2. Try wget it: wget "http://example.com/abc—d"; or wget "
http://example.com/abc%E2%80%94d"; directly
3. Wget save this URL to file "abc\342%80%94d". Expected is
"abc%E2%80%94d". This is a bug.

The problem is that it checks if it's a printable character in latin1.
There is a bug at https://savannah.gnu.org/bugs/index.php?37564
An option would be to use --restrict-file-names=nocontrol to get the em
dash in the filename, instead of the percent-encoded version.
---
        Do you mean printable character in the current locale?
        Or can it not do UTF-8 at all?


latin1 is going the way of the dodo...most sites still use it, but
HTML5 is supposed to be UTF8..

If it found "González" on a file would it be able to save it correctly?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]