qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 2/4] nvdimm: warn if the backend is not a DAX


From: Haozhong Zhang
Subject: Re: [Qemu-devel] [PATCH v2 2/4] nvdimm: warn if the backend is not a DAX device
Date: Mon, 12 Jun 2017 11:18:21 +0800
User-agent: NeoMutt/20170428 (1.8.2)

On 06/08/17 15:51 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 06, 2017 at 03:22:27PM +0800, Haozhong Zhang wrote:
> > Applications in Linux guest that use device-dax never trigger flush
> > that can be trapped by KVM/QEMU. Meanwhile, if the host backend is not
> > device-dax, QEMU cannot guarantee the persistence of guest writes.
> > Before solving this flushing problem, QEMU should warn users if the
> > host backend is not device-dax.
> 
> No one reads warnings unless things fail but they can be a debugging aid
> if they do. But they have to be simple and robust to be helpful for
> that. This one seems to me too complex and fragile.
>
> 
> > Signed-off-by: Haozhong Zhang <address@hidden>
> > Message-id: address@hidden
> 
> Don't include this line in commit log please. Put it after --- if you
> must.
> 
> > ---
> >  hw/mem/nvdimm.c      |  6 ++++++
> >  include/qemu/osdep.h |  9 ++++++++
> >  util/osdep.c         | 61 
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 76 insertions(+)
> > 
> > diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> > index a9b0863f20..b23542fbdf 100644
> > --- a/hw/mem/nvdimm.c
> > +++ b/hw/mem/nvdimm.c
> > @@ -26,6 +26,7 @@
> >  #include "qapi/error.h"
> >  #include "qapi/visitor.h"
> >  #include "hw/mem/nvdimm.h"
> > +#include "qemu/error-report.h"
> >  
> >  static void nvdimm_get_label_size(Object *obj, Visitor *v, const char 
> > *name,
> >                                    void *opaque, Error **errp)
> > @@ -84,6 +85,11 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error 
> > **errp)
> >      NVDIMMDevice *nvdimm = NVDIMM(dimm);
> >      uint64_t align, pmem_size, size = memory_region_size(mr);
> >  
> > +    if (!qemu_fd_is_dev_dax(memory_region_get_fd(mr))) {
> > +        error_report("warning: nvdimm backend does not look like a DAX 
> > device, "
> > +                     "unable to guarantee persistence of guest writes");
> > +    }
> > +
> >      align = memory_region_get_alignment(mr);
> >  
> >      pmem_size = size - nvdimm->label_size;
> > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > index 1c9f5e260c..7f26af371e 100644
> > --- a/include/qemu/osdep.h
> > +++ b/include/qemu/osdep.h
> > @@ -470,4 +470,13 @@ char *qemu_get_pid_name(pid_t pid);
> >   */
> >  pid_t qemu_fork(Error **errp);
> >  
> > +/**
> > + * qemu_fd_is_dev_dax:
> > + *
> > + * Check whether @fd describes a DAX device.
> > + *
> > + * Returns true if it is; otherwise, return false.
> > + */
> > +bool qemu_fd_is_dev_dax(int fd);
> > +
> >  #endif
> > diff --git a/util/osdep.c b/util/osdep.c
> > index a2863c8e53..02881f96bc 100644
> > --- a/util/osdep.c
> > +++ b/util/osdep.c
> > @@ -471,3 +471,64 @@ writev(int fd, const struct iovec *iov, int iov_cnt)
> >      return readv_writev(fd, iov, iov_cnt, true);
> >  }
> >  #endif
> > +
> > +#ifdef __linux__
> > +static ssize_t qemu_dev_dax_sysfs_read(int fd, const char *entry,
> > +                                       char *buf, size_t len)
> > +{
> > +    ssize_t read_bytes;
> > +    struct stat st;
> > +    unsigned int major, minor;
> > +    char *path, *pos;
> > +    int sysfs_fd;
> > +
> > +    if (fstat(fd, &st)) {
> > +        return 0;
> > +    }
> > +
> > +    major = major(st.st_rdev);
> > +    minor = minor(st.st_rdev);
> > +    path = g_strdup_printf("/sys/dev/char/%u:%u/%s", major, minor, entry);
> > +
> > +    sysfs_fd = open(path, O_RDONLY);
> > +    g_free(path);
> > +    if (sysfs_fd == -1) {
> > +        return 0;
> > +    }
> > +
> > +    read_bytes = read(sysfs_fd, buf, len - 1);
> > +    close(sysfs_fd);
> > +    if (read_bytes > 0) {
> > +        buf[read_bytes] = '\0';
> > +        pos = g_strstr_len(buf, read_bytes, "\n");
> > +        if (pos) {
> > +            *pos = '\0';
> > +        }
> > +    }
> > +
> > +    return read_bytes;
> > +}
> > +#endif /* __linux__ */
> > +
> > +bool qemu_fd_is_dev_dax(int fd)
> > +{
> > +    bool is_dax = false;
> > +
> > +#ifdef __linux__
> > +    char devtype[7];
> > +    ssize_t len;
> > +
> > +    if (fd == -1) {
> > +        return false;
> > +    }
> > +
> > +    len = qemu_dev_dax_sysfs_read(fd, "device/devtype",
> > +                                  devtype, sizeof(devtype));
> > +    if (len <= 0) {
> > +        return false;
> > +    }
> 
> This can easily trigger false positives e.g. if sysfs access
> is blocked by selinux.
>

A part of userland interface of nvdimm is exposed via sysfs, so QEMU
has to access to certain sysfs entries in order to, e.g. perform the
DAX check in this patch and get alignment requirement in patch 4.

I agree it's not robust if the permission is not properly granted. We
may either
1) require users to grant the permissions to QEMU and document the
   requirements
 or
2) get the information from sysfs from the outside of QEMU
   (e.g. libvirt) which has the permission and then pass to QEMU.

Which one do you think is better?

> 
> > +    is_dax = !strncmp(devtype, "nd_dax", len);
> 
> E.g. will return true on any string starting with nd_dax.
> And it's not clear non-dax devices can't guarantee safety.

This check can be done by checking whether /sys/dev/char/MAJ:MIN/subsystem
points to /sys/class/dax, as Dan suggested, if we decide to access
sysfs in QEMU.

Thanks,
Haozhong

> 
> > +#endif /* __linux__ */
> > +
> > +    return is_dax;
> > +}
> > -- 
> > 2.11.0



reply via email to

[Prev in Thread] Current Thread [Next in Thread]