bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uuencode: multi-bytes char in remote file name contains bytes >0x80


From: Bruno Haible
Subject: Re: uuencode: multi-bytes char in remote file name contains bytes >0x80
Date: Sun, 3 Jul 2011 22:43:55 +0200
User-agent: KMail/1.9.9

Referring to
<http://lists.gnu.org/archive/html/bug-gnu-utils/2011-07/msg00000.html>:

An obvious problem with the patch is that it considers a file name to be a
byte sequence. But different users may work in different locales, with
different encodings. If a Chinese user with file names in GB18030 encoding
sends a file to a user whose file names are UTF-8 encoded, or vice versa, the
file name needs to be converted. The usual approach for such cases is to use
UTF-8 as a "pivot" encoding. For example, in 'pax' [1] file names are
transferred in UTF-8 encoding.

But actually, what's the point of the patch? The most frequently used
archive programs for interchange are probably 'tar'/'pax', 'zip', and '7-zip'.
- 'pax' has support for Unicode file names [1]; the biggest problem is that
  the 'pax' format is the default one for GNU 'tar'.
- 'zip' has support for Unicode file names [2][3].
- '7-zip' supports Unicode file names as well [4].

Users who really want to transfer files with non-ASCII names can use one
of these three archive formats and send an uuencoded archive.

Bruno

[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html
[2] http://www.info-zip.org/UnZip.html
[3] 
http://info.michael-simons.eu/2010/01/05/create-zip-archives-containing-unicode-filenames-with-java/
[4] http://www.7-zip.org/7z.html
-- 
In memoriam Yuri Shchekochikhin 
<http://en.wikipedia.org/wiki/Yuri_Shchekochikhin>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]