[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incrementa
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot |
Date: |
Thu, 10 May 2018 09:26:59 +0100 |
User-agent: |
Mutt/1.9.3 (2018-01-21) |
On Wed, May 09, 2018 at 07:54:31PM +0200, Max Reitz wrote:
> On 2018-05-09 12:16, Stefan Hajnoczi wrote:
> > On Tue, May 08, 2018 at 05:03:09PM +0200, Kevin Wolf wrote:
> >> Am 08.05.2018 um 16:41 hat Eric Blake geschrieben:
> >>> On 12/25/2017 01:33 AM, He Junyan wrote:
> >> 2. Make the nvdimm device use the QEMU block layer so that it is backed
> >> by a non-raw disk image (such as a qcow2 file representing the
> >> content of the nvdimm) that supports snapshots.
> >>
> >> This part is hard because it requires some completely new
> >> infrastructure such as mapping clusters of the image file to guest
> >> pages, and doing cluster allocation (including the copy on write
> >> logic) by handling guest page faults.
> >>
> >> I think it makes sense to invest some effort into such interfaces, but
> >> be prepared for a long journey.
> >
> > I like the suggestion but it needs to be followed up with a concrete
> > design that is feasible and fair for Junyan and others to implement.
> > Otherwise the "long journey" is really just a way of rejecting this
> > feature.
> >
> > Let's discuss the details of using the block layer for NVDIMM and try to
> > come up with a plan.
> >
> > The biggest issue with using the block layer is that persistent memory
> > applications use load/store instructions to directly access data. This
> > is fundamentally different from the block layer, which transfers blocks
> > of data to and from the device.
> >
> > Because of block DMA, QEMU is able to perform processing at each block
> > driver graph node. This doesn't exist for persistent memory because
> > software does not trap I/O. Therefore the concept of filter nodes
> > doesn't make sense for persistent memory - we certainly do not want to
> > trap every I/O because performance would be terrible.
> >
> > Another difference is that persistent memory I/O is synchronous.
> > Load/store instructions execute quickly. Perhaps we could use KVM async
> > page faults in cases where QEMU needs to perform processing, but again
> > the performance would be bad.
>
> Let me first say that I have no idea how the interface to NVDIMM looks.
> I just assume it works pretty much like normal RAM (so the interface is
> just that it’s a part of the physical address space).
>
> Also, it sounds a bit like you are already discarding my idea, but here
> goes anyway.
>
> Would it be possible to introduce a buffering block driver that presents
> the guest an area of RAM/NVDIMM through an NVDIMM interface (so I
> suppose as part of the guest address space)? For writing, we’d keep a
> dirty bitmap on it, and then we’d asynchronously move the dirty areas
> through the block layer, so basically like mirror. On flushing, we’d
> block until everything is clean.
>
> For reading, we’d follow a COR/stream model, basically, where everything
> is unpopulated in the beginning and everything is loaded through the
> block layer both asynchronously all the time and on-demand whenever the
> guest needs something that has not been loaded yet.
>
> Now I notice that that looks pretty much like a backing file model where
> we constantly run both a stream and a commit job at the same time.
>
> The user could decide how much memory to use for the buffer, so it could
> either hold everything or be partially unallocated.
>
> You’d probably want to back the buffer by NVDIMM normally, so that
> nothing is lost on crashes (though this would imply that for partial
> allocation the buffering block driver would need to know the mapping
> between the area in real NVDIMM and its virtual representation of it).
>
> Just my two cents while scanning through qemu-block to find emails that
> don’t actually concern me...
The guest kernel already implements this - it's the page cache and the
block layer!
Doing it in QEMU with dirty memory logging enabled is less efficient
than doing it in the guest.
That's why I said it's better to just use block devices than to
implement buffering.
I'm saying that persistent memory emulation on top of the iscsi:// block
driver (for example) does not make sense. It could be implemented but
the performance wouldn't be better than block I/O and the
complexity/code size in QEMU isn't justified IMO.
Stefan
> > Most protocol drivers do not support direct memory access. iscsi, curl,
> > etc just don't fit the model. One might be tempted to implement
> > buffering but at that point it's better to just use block devices.
> >
> > I have CCed Pankaj, who is working on the virtio-pmem device. I need to
> > be clear that emulated NVDIMM cannot be supported with the block layer
> > since it lacks a guest flush mechanism. There is no way for
> > applications to let the hypervisor know the file needs to be fsynced.
> > That's what virtio-pmem addresses.
> >
> > Summary:
> > A subset of the block layer could be used to back virtio-pmem. This
> > requires a new block driver API and the KVM async page fault mechanism
> > for trapping and mapping pages. Actual emulated NVDIMM devices cannot
> > be supported unless the hardware specification is extended with a
> > virtualization-friendly interface in the future.
> >
> > Please let me know your thoughts.
> >
> > Stefan
> >
>
>
signature.asc
Description: PGP signature
- Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot, Eric Blake, 2018/05/08
- Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot, Kevin Wolf, 2018/05/08
- Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot, Stefan Hajnoczi, 2018/05/09
- Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot, Max Reitz, 2018/05/09
- Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot,
Stefan Hajnoczi <=
- Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot, Kevin Wolf, 2018/05/11
- Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot, Stefan Hajnoczi, 2018/05/14
- Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot, He, Junyan, 2018/05/28
- Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot, Stefan Hajnoczi, 2018/05/30
- Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot, Kevin Wolf, 2018/05/30
- Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot, Stefan Hajnoczi, 2018/05/31