qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
Date: Thu, 10 Sep 2015 11:47:58 +0200

On Tue, 8 Sep 2015 22:03:01 +0800
Xiao Guangrong <address@hidden> wrote:

> 
> 
> On 09/07/2015 09:40 PM, Igor Mammedov wrote:
> > On Sun, 6 Sep 2015 14:07:21 +0800
> > Xiao Guangrong <address@hidden> wrote:
> >
> >>
> >>
> >> On 09/02/2015 07:31 PM, Igor Mammedov wrote:
> >>> On Wed, 2 Sep 2015 18:36:43 +0800
> >>> Xiao Guangrong <address@hidden> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 09/02/2015 05:58 PM, Igor Mammedov wrote:
> >>>>> On Fri, 14 Aug 2015 22:51:59 +0800
> >>>>> Xiao Guangrong <address@hidden> wrote:
> >>>>>
> >>>>>> Introduce "pc-nvdimm" device and it has two parameters:
> >>>>> Why do you use prefix "pc-", I suppose we potentially
> >>>>> could use this device not only with x86 targets but with
> >>>>> other targets as well.
> >>>>> I'd just drop 'pc' prefix through out patchset.
> >>>>
> >>>> Yeah, the prefix is stolen from pc-dimm, will drop this
> >>>> prefix as your suggestion.
> >>>>
> >>>>>
> >>>>>> - @file, which is the backed memory file for NVDIMM device
> >>>>> Could you try to split device into backend/frontend parts,
> >>>>> like it's done with pc-dimm. As I understand it's preferred
> >>>>> way to implement this kind of devices.
> >>>>> Then you could reuse memory backends that we already have
> >>>>> including file backend.
> >>>>
> >>>> I considered it too and Stefan, Paolo got the some idea in
> >>>> V1's review, however:
> >>>>
> >>>> | However, file-based memory used by NVDIMM is special, it divides the 
> >>>> file
> >>>> | to two parts, one part is used as PMEM and another part is used to 
> >>>> store
> >>>> | NVDIMM's configure data.
> >>>> |
> >>>> | Maybe we can introduce "end-reserved" property to reserve specified 
> >>>> size
> >>>> | at the end of the file. Or create a new class type based on
> >>>> | memory-backend-file (named nvdimm-backend-file) class to hide this 
> >>>> magic
> >>>> | thing?
> >>> I'd go with separate backend/frontend idea.
> >>>
> >>> Question is if this config area is part backend or frontend?
> >>
> >> Configdata area is used to store nvdimm device's configuration, normally, 
> >> it's
> >> namespace info.
> >>
> >> Currently, we chosen configdata located at the end of nvdimm's 
> >> backend-memory
> >> as it's easy to configure / use and configdata is naturally non-volatile 
> >> and it
> >> is like the layout on physical device.
> >>
> >> However, using two separated backed-memory is okay, for example:
> >> -object memory-backend-file,id=mem0,file=/storage/foo
> >> -object memory-backend-file,id=mem1,file=/storage/bar
> >> -device nvdimm,memdev=mem0,configdata=mem1
> >> then configdata is written to a single backend.
> >>
> >> Which one is better for you? :)
> >>
> >>> If we pass-through NVDIMM device do we need to set configdata=true
> >>> and QEMU would skip building config structures and use structures
> >>> that are already present on passed-through device in that place?
> >>>
> >>
> >> The file specified by @file is something like a normal disk, like 
> >> /dev/sda/,
> >> host process can use whole space on it. If we want to directly pass it to 
> >> guest,
> >> we can specify 'configdata=false'. If we allow guest to 'partition' (create
> >> namespace on) it then we use 'configdata=true' to reserve some space to 
> >> store
> >> its partition info (namesapce info).
> > As far as I understand currently linux provides to userspace only one 
> > interface
> > which is block device i.e. /dev/sdX and on top of it userspace can put
> > PM/DAX aware filesystem and use files from it. In either cases kernel
> > just provides access to separate namespaces and not to a whole NVDIMM which
> > includes 'labels area'. Hence /dev/sdX is not passed-though NVDIMM,
> > so we could consider it as just a file/storage that could be used by 
> > userspace.
> >
> 
> Yes, it is.
> 
> > Lets assume that NVDIMM should always have 'labels area'.
> > In that case I'd always reserve space for it and
> >   * format it (build a new one) if backend doesn't have a
> >     valid labels area dropping configdata parameter along the way
On a second glance, qemu probably shouldn't 'format/build' labels
in this case and only supply non configured NVDIMM to guest.
It probably should be guest responsibility to configure namespaces
on bare NVDIMM. Also it would allow to seamlessly introduce separate labels
backend as described below to cover 'configdata = false' case.

> >   * or if backing-file already has valid labels area I'd just use it.
> 
> Yes.
> 
> >
> > If you need to make labels area readonly you can introduce 
> > 'NVDIMM.readonly_labels'
> > option and just use labels backend's without allowing changes writeback.
> > IT would be better to make it another series on top of basic NVDIMM 
> > implementation
> > if there is an actual usecase for it.
> 
> I'd prefer the way that discards not only its label data but also the whole 
> nvdimm device,
> that is, open(, RDONLY) + mmap(, MAP_PRIVATE), the idea was raised by Stefan.
Yes, we can do this but it's not related to NVDIMM at all, it's backend's 
feature
that also could be (re)used by pc-dimm with file backend.
I think it could be done on top.

> 
> The 'configdata = false' in this patchset does not aim at making label 
> readonly, it provides
> a way to make the file in a single partition. For example, you create a image 
> in /dev/pmem0,
> pass it to guest, then the whole file will appear at /dev/pmem0 in guest, and 
> guest can directly
> use the image in that device. Under this case, no file region is reserved and 
> the label data is
> build in memory which can not be updated by guest.
If we will have 'configdata = false' as in this patchset, one will have
to take special care to keep it bug compatible with previous versions
as implementation changes/fixes might affect compatibility.
I think it is simpler and more robust to always have metadata in backend.
That's also would be migration friendly since labels area will stay
consistent with data it describes, regardless of QEMU version when we
do cross version migration.

Later we could introduce a separate option for a separate labels area
to cover above described usecase.

> 
> >
> > PS:
> > Also when you write commit messages, comment and name variables try to use 
> > terms from
> > relevant spec and mention specs where you describe data structures from 
> > them.
> 
> Parts of the names/definitions were stolen from Kernel NVDIMM driver, i will 
> update it to
> let them reflect the specs. Thanks, Igor.
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to address@hidden
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]