qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qid path collision issues in 9pfs


From: Antonios Motakis
Subject: Re: [Qemu-devel] [RFC] qid path collision issues in 9pfs
Date: Wed, 24 Jan 2018 17:40:57 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2



On 01/24/2018 02:30 PM, Greg Kurz wrote:
Thanks Emilio for providing these valuable suggestions ! :)

On Sat, 20 Jan 2018 17:03:49 -0500
"Emilio G. Cota" <address@hidden> wrote:

On Fri, Jan 19, 2018 at 19:05:06 -0500, Emilio G. Cota wrote:
On Fri, 12 Jan 2018 19:32:10 +0800
Antonios Motakis <address@hidden> wrote:
Since inodes are not completely random, and we usually have a handful of device 
IDs,
we get a much smaller number of entries to track in the hash table.

So what this would give:
(1)     Would be faster and take less memory than mapping the full 
inode_nr,devi_id
tuple to unique QID paths
(2)     Guaranteed not to run out of bits when inode numbers stay below the 
lowest
54 bits and we have less than 1024 devices.
(3)     When we get beyond this this limit, there is a chance we run out of 
bits to
allocate new QID paths, but we can detect this and refuse to serve the offending
files instead of allowing a collision.

We could tweak the prefix size to match the scenarios that we consider more 
likely,
but I think close to 10-16 bits sounds reasonable enough. What do you think?
Assuming assumption (2) is very likely to be true, I'd suggest
dropping the intermediate hash table altogether, and simply refuse
to work with any files that do not meet (2).

That said, the naive solution of having a large hash table with all entries
in it might be worth a shot.
hmm but that would still take a lot of memory.

Given assumption (2), a good compromise would be the following,
taking into account that the number of total gids is unlikely to
reach even close to 2**64:
- bit 63: 0/1 determines "fast" or "slow" encoding
- 62-0:
   - fast (trivial) encoding: when assumption (2) is met
     - 62-53: device id (it fits because of (2))
     - 52-0: inode (it fits because of (2))
And as pointed by Eduard, we may have to take the mount id into account
as well if we want to support the case where we have bind mounts in the
exported directory... My understanding is that mount ids are incremental
and reused when the associated fs gets unmounted: if we assume that the
host doesn't have more than 1024 mounts, we would need 10 bits to encode
it.

The fast encoding could be something like:

62-53: mount id
52-43: device id
42-0: inode

I don't agree that we should take the mount id into account though.
The TL; DR: I think the issue about bind mounts is distinct from the QID path issue, and just happens to be worked around when we (falsely) advertise to the guest that 2 files are not the same (even though they are). Making unique 2 files that shouldn't be, will cause other issues.

The kernel's 9p client documentation states that with fscache enabled, there is no support for coherency when multiple users (i.e. guest and host) are reading and writing to the share. If this limitation is not taken into account, there are multiple issues with stale caches in the guest.

Disambiguating files using mount id might work around fscache limitations in this case, but will introduce a host of other bugs. For example: (1) The user starts two containers sharing a directory (via host bind mounts) with data
(2) Container 1 writes something to a file in the data dir
(3) Container 2 reads from the file
(4) The guest kernel doesn't know the the file is one and the same, so it is twice in the cache. Container 2 might get stale data

The user, wrote the code running in containers 1 and 2, assuming they can share a file when running on the same system. For example, one container generating the configuration file for another. It doesn't matter if the user wrote the applications correctly, syncing data when needed. It only breaks because we lied to the guest 9p client, telling it that they are distinct files. 9p is supposed to support this.

This is why I think including the mount id in the QID path would be another bug, this time in the opposite direction.

In contrast the QID path issues:
(1) do not require touching files on the host, after the guest has already mounted the share, to trigger it. (2) can be explained by the guest assuming that two or more distinct files are actually the same.

The bind mount issue:
(1) bind mounts have to be changed on the host after the guest has mounted the share. Already a no-no for fscache, and can be explained by stale caches in the guest. (2) The guest is correctly identifying that they refer to the same file. There is no collision here.


   - slow path: assumption (2) isn't met. Then, assign incremental
     IDs in the [0,2**63-1] range and track them in a hash table.

Choosing 10 or whatever else bits for the device id is of course TBD,
as Antonios you pointed out.

This is a best effort to have a fallback in QEMU. The right way to
address the issue would really be to extend the protocol to have
bigger qids (eg, 64 for inode, 32 for device and 32 for mount).

Does this mean we don't need the slow path for the fallback case? I have tested a glib hash table implementation of the "fast path", I will look into porting it to the QEMU hash table and will send it to this list.

Keep in mind, we still need a hash table for the device id, since it is 32 bits, but we will try to reserve only 10-16 bits for it.

Cheers,
Tony



reply via email to

[Prev in Thread] Current Thread [Next in Thread]