qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/4] 9pfs: stat_to_qid: use device as input to q


From: Antonios Motakis
Subject: Re: [Qemu-devel] [PATCH 3/4] 9pfs: stat_to_qid: use device as input to qid.path
Date: Fri, 16 Feb 2018 11:20:33 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2



On 02/09/2018 06:57 PM, Greg Kurz wrote:
On Fri, 9 Feb 2018 10:06:05 -0600
Eric Blake <address@hidden> wrote:

On 02/09/2018 09:13 AM, Greg Kurz wrote:
On Thu, 8 Feb 2018 19:00:18 +0100
<address@hidden> wrote:
From: Antonios Motakis <address@hidden>

To support multiple devices on the 9p share, and avoid
qid path collisions we take the device id as input
to generate a unique QID path. The lowest 48 bits of
the path will be set equal to the file inode, and the
top bits will be uniquely assigned based on the top
16 bits of the inode and the device id.
+/* stat_to_qid needs to map inode number (64 bits) and device id (32 bits)
+ * to a unique QID path (64 bits). To avoid having to map and keep track
+ * of up to 2^64 objects, we map only the 16 highest bits of the inode plus
+ * the device id to the 16 highest bits of the QID path. The 48 lowest bits
+ * of the QID path equal to the lowest bits of the inode number.
+ *
+ * This takes advantage of the fact that inode number are usually not
+ * random but allocated sequentially, so we have fewer items to keep
+ * track of.
+ */
+static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf,
+                                uint64_t *path)
+{
+    QppEntry lookup = {
+        .dev = stbuf->st_dev,
+        .ino_prefix = (uint16_t) (stbuf->st_ino >> 48)
+    }, *val;
+    uint32_t hash = qpp_hash(lookup);
+
+    val = qht_lookup(&pdu->s->qpp_table, qpp_lookup_func, &lookup, hash);
+
+    if (!val) {
+        if (pdu->s->qp_prefix_next == 0) {
+            /* we ran out of prefixes */
+            return -ENFILE;
Not sure this errno would make sense for guest syscalls that don't open
file descriptors... Maybe ENOENT ?

Cc'ing Eric for insights.
Actually, it makes sense to me:

ENFILE 23 Too many open files in system

You only get here if the hash table filled up, which means there are
indeed too many open files (even if this syscall wasn't opening a file).
   In fact, ENFILE is usually associated with running into ulimit
restrictions, and come to think of it, you are more likely to hit your
ulimit than you are to run into a hash collision (so this code may be
very hard to reach in practice, but if you do reach it, it behaves
similarly to what you were more likely to hit in the first place).

ENOENT implies the file doesn't exist - but here, the file exists but we
can't open it because we're out of resources for tracking it.

Only the host knows that the file exists actually. If stat_to_qid() for
this path returns ENOENT, then the file shouldn't be visible in the
guest.

I say "shouldn't" instead of "isn't" because I now realize that
v9fs_do_readdir(), which is used to implement getdents() in the
guest, sends a degraded QID, hand-crafted without calling
stat_to_qid() at all... :-\

         /*
          * Fill up just the path field of qid because the client uses
          * only that. To fill the entire qid structure we will have
          * to stat each dirent found, which is expensive
          */
         size = MIN(sizeof(dent->d_ino), sizeof(qid.path));
         memcpy(&qid.path, &dent->d_ino, size);
         /* Fill the other fields with dummy values */
         qid.type = 0;
         qid.version = 0;


Antonios, your patchset should probably also address this.

The problem is that the dirent we get from v9fs_co_readdir()
only has the inode number, so I guess we must build a dummy
struct stat with:

     stbuf.st_ino = dent->d_ino
     stbuf.st_dev = st_dev of the parent directory

The st_dev of the parent directory could be obtained in
v9fs_readdir() and passed to v9fs_do_readdir(). Another
solution could be to cache the QID in the V9fsFidState
of the parent directory when it is open.

Also, if we hit a collision while reading the directory, I'm
afraid the remaining entries won't be read at all. I'm not
sure this is really what we want.
Good catch!

I'm not that sure that we can make the assumption that all entries in a dir will share the st_dev,
I think we have to check it for each entry...


Also, POSIX permits returning specific errno codes that aren't otherwise
listed for a syscall if the usual semantics of that errno code are
indeed the reason for the failure (you should prefer to fail with errno
codes documented by POSIX where possible, but POSIX does not limit you
to just that set).

Ok, then ENFILE wouldn't be that bad in the end.

Thanks for your POSIX expertise :)
Will keep that one then!


Cheers,

--
Greg




reply via email to

[Prev in Thread] Current Thread [Next in Thread]