qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v10] Support vhd type VHD_DIFFERENCING


From: Philipp Hahn
Subject: Re: [Qemu-devel] [PATCH v10] Support vhd type VHD_DIFFERENCING
Date: Sun, 08 Mar 2015 11:53:35 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.4.0

Hello,

On 08.03.2015 02:53, Xiaodong Gong wrote:
> the encoding type of parent location is must be utf 8,utf16e,according
> to the draft

Yes, the SPEC for VPC/VHD specifies the character encoding to use, which
is good for being portable.

> ascii is the encoding type to store the string of parent location in
> memery and to use fopen()

No: For the (Linux) kernel the filename is a sequence of 8 bit bytes,
where only '\0'=end_of_string and '/'=path_separator are handled
specially. All other bytes have no special meaning and are passed in and
out as is.

Only the applications are doing the character encoding. Normally this is
not a problem as you setup your system once with one encoding (nowadays
UTF-8) and use that consistently: If you enter ä on the keyboard,
the kernels input layer returns \u00E4 as the two-byte UTF-8 sequence
> $ echo -n ä | xxd -g 1
> 0000000: c3 a4
Any application can either just pass the byte sequence around as a CLOB
(or use any other encoding internally - but then it must know that the
input-encoding is UTF-8), but when again doing any system call, they
will again pass that same byte sequence as the file-name, which the
kernel will store on disk.
If you take that disk to another computer, which does NOT use Unicode,
you have a problem: If, for example, that one is still using the old
ISO-8859-1 encoding used in western Europe, you file will be named
differently:
> $ echo -n ä | iconv -f ISO-8859-1 -t UTF-8
> ä

(The reverse is even more painful, as not any ISO-8859-1 character
sequence is a valid UTF-8 byte sequence - several years back when I
moved from my old ISO-8859-1 to a more modern UTF-8 setup, I had to
rename lots of files to be readable again)

You can even test that locally on one system by creating a file
containing an umlaut in its name and then to display that in a non-UTF-8
terminal / environment:
> $ touch ä
> $ LANG=C ls -NQ
> "\303\244"

> ascii need to translate to other encoding type according to LANG when to
> show the information of the vhd file using the qemu-info and so on

No: your assumption that ASCII is used is IMHO wrong: ASCII is only 7
bit, but the kernel interface is 8 bit. The terminal input- and output
layer nowadays are UTF-8, so as long as you're working on the console
everything is fine. If you mix in GUIs and libraries doing their own
encoding/decoding, things get more interesting.

But when you do explicit character conversion like you do for VHD, you
must honor the user configured character encoding of the environment
yourself, that is use LC_CTYPE for any conversion from input, for output
which includes file names.

I checked xen/tools/blktap2/vpc/lib/libvhd.c #
vhd_initialize_header_parent_name()
which also (wrongly) assumes ASCII. Because of the creating a snapshot
using vhd-utils is also broken:

> $ /usr/bin/vhd-util create -n ä.vhd -s 1
> $ /usr/bin/vhd-util snapshot -n snap.vhd -p ä.vhd ; echo $?
> 84

Next I checked
<https://technet.microsoft.com/de-de/library/gg318052%28v=ws.10%29.aspx>
to create a VHD using umlauts with Windows 7:

> cmd # as Admin
> diskpart
> create vdisk file="C:\ä.vhd" maximum=2000 type=expandable
> create vdisk file="C:\snap.vhd" parent="C:\ä.vhd"

But vhd-utils from Xen is broken:

> $ /usr/bin/vhd-util read -n snap.vhd -p
> VHD Header Summary:
...
> Parent name         : failed to read name
...
> VHD Parent Locators:
> --------------------
> locator:            : 0
....
> failed to read parent name

With the attached patch it works:

> VHD Header Summary:
> -------------------
...
> Parent name         : /ä.vhd
...
> VHD Parent Locators:
> --------------------
> locator:            : 0
>        code         : PLAT_CODE_W2KU
...
>        decoded name : /ä.vhd
> 
> locator:            : 1
>        code         : PLAT_CODE_W2RU
...
>        decoded name : ./ä.vhd

Hope that clarified things.

Philipp

Attachment: snap.vhd
Description: Text Data

Attachment: 0001-VHD-Fix-locale-aware-character-encoding-handling.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]