qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Adding a persistent writeback cache to qemu


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Adding a persistent writeback cache to qemu
Date: Fri, 21 Jun 2013 14:55:20 +0200

On Thu, Jun 20, 2013 at 4:25 PM, Alex Bligh <address@hidden> wrote:
> Stefan,
>
>
> --On 20 June 2013 11:46:18 +0200 Stefan Hajnoczi <address@hidden> wrote:
>
>>> The concrete problem here is that flashcache/dm-cache/bcache don't
>>> work with the rbd (librbd) driver, as flashcache/dm-cache/bcache
>>> cache access to block devices (in the host layer), and with rbd
>>> (for instance) there is no access to a block device at all. block/rbd.c
>>> simply calls librbd which calls librados etc.
>>>
>>> So the context switches etc. I am avoiding are the ones that would
>>> be introduced by using kernel rbd devices rather than librbd.
>>
>>
>> I understand the limitations with kernel block devices - their
>> setup/teardown is an extra step outside QEMU and privileges need to be
>> managed.  That basically means you need to use a management tool like
>> libvirt to make it usable.
>
>
> It's not just the management tool (we have one of those). Kernel
> devices are pain. As a trivial example, duplication of UUIDs, LVM IDs
> etc. by hostile guests can cause issues.

If you have those problems then something is wrong:

LVM shouldn't definitely not be scanning guest devices.

As for disk UUIDs, they come from the SCSI target which is under your
control, right?  In fact, you can assign different serial numbers to
drives attached in QEMU, the host serial number will not be used.
Therefore, there is a clean separation there and guests do not control
host UUIDs.

The one true weakness here is that Linux reads partition tables
automatically.  Not sure if there's a way to turn it off or how hard
it would be to add that.

>> But I don't understand the performance angle here.  Do you have profiles
>> that show kernel rbd is a bottleneck due to context switching?
>
>
> I don't have test figures - perhaps this is just received wisdom, but I'd
> understood that's why they were faster.
>
>
>> We use the kernel page cache for -drive file=test.img,cache=writeback
>> and no one has suggested reimplementing the page cache inside QEMU for
>> better performance.
>
>
> That's true, but I'd argue that is a little different because nothing
> blocks on the page cache (it being in RAM). You don't get the situation
> where the tasks sleeps awaiting data (from the page cache), the data
> arrives, and the task then needs to to be scheduled in. I will admit
> to a degree of handwaving here as I hadn't realised the claim qemu+rbd
> was more efficient than qemu+blockdevice+kernelrbd was controversial.

It may or may not be more efficient, unless there is some performance
analysis we don't know how big a difference and why.

>> but if there's really a case for it with performance profiles then I
>> guess it would be necessary.  But we should definitely get feedback from
>> the Ceph folks too.
>
>
> The specific problem we are trying to solve (in case that's not
> obvious) is the non-locality of data read/written by ceph. Whilst
> you can use placement to localise data to the rack level, even if
> one of your OSDs is in the machine you end up waiting on network
> traffic. That is apparently hard to solve inside Ceph.

I'm not up-to-speed on Ceph architecture, is this because you need to
visit a metadata server before you access the storage.  Even when the
data is colocated on the same machine you'll need to ask the metadata
server first?

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]