qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device


From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
Date: Fri, 2 Sep 2016 16:05:17 +0200
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0


On 02.09.16 14:16, Ola Liljedahl wrote:
> 
> 
> On 02/09/2016, 12:26, "Gonglei (Arei)" <address@hidden> wrote:
> 
>>
>>> -----Original Message-----
>>> From: Alexander Graf [mailto:address@hidden
>>> Sent: Friday, September 02, 2016 4:07 PM
>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>>> specification
>>>
>>>
>>>
>>> On 02.09.16 05:08, Gonglei (Arei) wrote:
>>>> Hi Alex,
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Alexander Graf [mailto:address@hidden
>>>>> Sent: Thursday, September 01, 2016 9:37 PM
>>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>>> specification
>>>>>
>>>>> On 08/30/2016 02:12 PM, Gonglei wrote:
>>>>>> The virtio crypto device is a virtual crypto device (ie. hardware
>>>>>> crypto accelerator card). The virtio crypto device can provide
>>>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
>>>>>>
>>>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
>>>>>
>>>>> I have mostly a few high level comments.
>>>>>
>>>>> For starters, a lot of the structs rely on the compiler to pad them
>>> to
>>>>> natural alignment. That may get us into trouble when trying to
>>> emulate a
>>>>> virtio device on a host with different guest architecture (like arm
>>> on
>>>>> x86). It'd be a good idea to manually pad everything to be 64bit
>>> aligned
>>>>> - then all fields are always at the same spot.
>>>>>
>>>> Good point! I'll do this in the next version. Thanks!
>>>>
>>>>> I also have a hard time getting my head around the final flow of
>>>>> everything. Do I always have to create a session to be able to emit a
>>>>> command? In that case, doesn't that slow down everything, since a
>>>>> request would then need to wait for the host reply to receive its
>>>>> session id? There should be some way to fire off a simple non-iv
>>>>> operation without any session set up imho.
>>>>>
>>>> For symmetric algorithms, we'd better create a session before
>>> executing
>>> encryption
>>>> Or decryption, because the session usually be kept for a specific
>>>> algorithm with specific key in the production environment. And if we
>>> only
>>> change the iv,
>>>> we don't need to re-create the session.
>>>
>>> I think we have a slight misunderstanding here :)
>>>
>>> The current workflow is
>>>
>>>   -> create session
>>>   <- session key
>>>   -> data in
>>>   <- data out
>>>   ...
>>>   <- close session
>>>   -> ack
>>>
>>> That means that at least for the first packet you have at least one full
>>> round-trip cost from guest to host to guest to be able to send any data.
>>>
>> Yes, that's true...
>>
>>> That sounds pretty expensive to me on the latency side. There are
>>> multiple ways to mitigate that. One idea was to have a separate path in
>>> parallel to the create session + data + close session dance that would
>>> combine them all into a single command. You would still have the session
>>> based version, but accelerate the one-blob case.
>>>
>>> Another idea would be to make the guest be the session id janitor. Then
>>> you could just do
>>>
>>>   -> create session with key X
>>>   -> data in
>>>   <- data out
>>>   ...
>>>
>>> so you save the round trip, if you combine command and data queues, as
>>> then the create and data bits are serialized by their position in the
>>> queue.
>>>
>> ... But for lots of crypto requests, the exhaust is low:
>>
>> -> create session with key X
>> <- session id
>>    //do something in the guest if you like
>> -> data in with session_id
>> -> data in with session_id
>> -> data in with session_id
>>    ... ...    (fill out data virtqueue)
>> <-data out
>> <-data out
>>    <-data out
>>
>> And this is what the production environment do currently.
>>
>>>>
>>>> For the asymmetric algorithms, we don't need create a session IIRC.
>>>>
>>>> So, I don't think this is a performance degradation, but a performance
>>> enhancement.
>>>>
>>>>> Also, I don't fully understand the split between control and data
>>>>> queues. As far as I read things, the control queue is used to create
>>>>> session ids and the data queues can then be used to push data. Is
>>> there
>>>>> any particular reason for the split? One thing that seems natural to
>>> me
>>>>> would be to have sessions be per-queue, so you would create a
>>> session on
>>>>> a particular queue and only have it ever be available there. That way
>>>>> you get rid of any locking for sessions.
>>>>>
>>>> We want to keep a unify request type (structure) for data queue, so
>>> we can
>>>> keep the session operation in the control queue. Of course the
>>> control queue
>>>> only used to create sessions currently, but we can extend its
>>> functions if
>>> needed
>>>> in the future.
>>>
>>> I still don't understand. With separate control+data queues you just get
>>> yourself into synchronization troubles. Both struct
>>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
>>> have an opcode as first le32 field. You can easily use that to determine
>>> both length of the payload as well as command (control vs data).
>>>
>>
>> There is a big problem that the control handle logic is synchronization,
>> but the data queue
>> handling logic is asynchronization. We can't combine them into one queue.
>> It will decrease the performance because you need indentify each packet
>> if we do this forcedly.
> Are you saying that control and data operations are handled by separate
> "blocksĀ²?
> If you combined control and data queues, there would have to be a (SW)
> demultiplexer
> that would add overhead (and potentially decrease throughout) especially
> for the data
> operations?

Uh, the multiplexer is as simple as a switch() statement on the opcode,
no? It might stall that one particular queue, but that sounds like
something you can improve by increasing the number of queues (and invent
something smart to ease polling of activity on more queues).

I think the main point Gonglei is bringing up here is that session
creation needs to finish before we can start firing off packets to that
session. That's another point that I dislike about the approach. I can't
think of any reason why session creation should be synchronous if you
allow the guest to handle host<->guest session tokens.

Data ordering needs to be consistent either way if you handle IVs in
hardware, so you naturally order session creation and data transmission
for a particular session.

But again, I really might just be missing something basic.

> And in order to truly benefit from (HW) crypto acceleration (especially
> for smaller
> data sizes), you want to minimise the amount of SW processing in the data
> path.

I think we're on full agreement on that one :). That's the whole point.
If you have 256 cores, you don't want all of your crypto sessions go
over a single bottleneck control queue. Creating sessions is a semi-hot
path.


Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]