qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] qemu-ga: use key-value store to avoid recycling


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH] qemu-ga: use key-value store to avoid recycling fd handles after restart
Date: Thu, 21 Mar 2013 08:03:11 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux)

mdroth <address@hidden> writes:

> On Wed, Mar 20, 2013 at 02:38:35PM -0400, Luiz Capitulino wrote:
>> On Wed, 20 Mar 2013 13:14:21 -0500
>> mdroth <address@hidden> wrote:
>> 
>> > > > > > > > +    handle = s->pstate.fd_counter++;
>> > > > > > > > +    if (s->pstate.fd_counter < 0) {
>> > > > > > > > +        s->pstate.fd_counter = 0;
>> > > > > > > > +    }
>> > > > > > > 
>> > > > > > > Is this handling the overflow case? Can't fd 0 be in use already?
>> > > > > > 
>> > > > > > It could, but it's very unlikely that an overflow/counter
>> > > > > > reset would
>> > > > > > result in issuing still-in-use handle, since
>> > > > > > guest-file-open would need
>> > > > > > to be called 2^63 times in the meantime.
>> > > > > 
>> > > > > Agreed, but as you do check for that case and as the right
>> > > > > fix is simple
>> > > > > and I think it's worth it. I can send a patch myself.
>> > > > > 
>> > > > 
>> > > > A patch to switch to tracking a list of issued handles in the keystore,
>> > > > or to return an error on overflow?
>> > > 
>> > > To return an error on overflow. Minor, but if we do handle it let's do it
>> > > right. Or, we could just add an assert like:
>> > > 
>> > > assert(s->pstate.fd_counter >= 0);
>> > 
>> > Ah, well, I'm not sure I understand then. You mean dropping:
>> > 
>> >     if (s->pstate.fd_counter < 0) {
>> >         s->pstate.fd_counter = 0;
>> >     }
>> > 
>> > And replacing it with an error or assertion?
>> 
>> Yes, because I had understood you meant that this is very unlikely to be
>> triggered because we'd need guest-file-open to be called 2^63. This is
>> what I agreed above, although thinking more about it there's also the
>> possibility of a malicious client doing this on purpose.
>> 
>> But now I see that what you really meant is that it's unlikely for fd 0
>> to be in use after 2^63 guest-file-open calls. Am I right? If yes, then
>
> Yup, that's the scenario I was referring to.
>
>> I disagree because there's no way to guarantee when a certain fd will be
>> in use or not, unless we allow fds to be returned.
>
> Yes, it's a real scenario that can indeed occur under "normal" usage, but in
> my head it's similar to assumptions we make about clocks not overflowing 
> within
> a timeframe that anyone cares about in terms of severity. To quantify it
> more:
>
> Given RPC latency there's really no way to run up the fd counter faster than
> once per nanosecond (more like millisecond atm), so you'd have to keep a 
> handle
> open and constantly called guest-file-open for a period of 292 years
> before a duplicate handle gets issued.

In other words, it's a complete non-issue :)

>> > If so, the overflow is actually expected: once we dish out handle 
>> > MAX_INT64,
>> > we should restart at 0. I initially made fd_counter a uint64_t so
>> > overflow/reset would happen more naturally, but since we issue handles as
>> > int64_t this would've caused other complications.
>> > 
>> > Something like this might be more clear about the intent though:
>> > 
>> >     handle = s->pstate.fd_counter;
>> >     if (s->pstate.fd_counter == MAX_INT64) {
>> >         s->pstate.fd_counter = 0;
>> >     } else {
>> >         s->pstate.fd_counter++;
>> >     }
>> 
>> I disagree about restarting to zero as I have explained above. You seem to
>> not like returning an error, is it because we'll make guest-file-open
>> useless after the limit is reached?
>
> Yes. But, on second thought, given the above I guess I don't mind returning an
> error as a failsafe for now. Although I think it should be an assert()
> with a FIXME:, since it really should be fixed before anyone ever manages
> to trigger it.
>
>> 
>> Let's review our options:
>> 
>>  1. When fd_count reaches MAX_INT64 we reset it to zero
>> 
>>     Pros: simple and guest-file-open always work
>>     Cons: fd 0 might be in use by a client
>> 
>>  2. When fd_count reaches MAX_INT64 we return an error
>> 
>>     Pros: simple and we fix 'cons' from item 1
>>     Cons: guest-file-open will have a usage count limit

             error path unreachable in practice

Such errors require special hackery to test.  The test needs to include
the error's consumer.

Waste of time if you ask me.

>>  3. Allow fds to be returned by clients on guest-file-close and do 2 on top
>> 
>>     Pros: solve problems discussed in items 1 and 2
>>     Cons: not trivial and the usage limit problem from item 2 can still
>>           happen if the client ends up not calling guest-file-close
>>           (although I do think we'll reach the OS limit here)

The OS limit on file descriptors is many orders of magnitude lower than
2^63.

>> Do you see other options? Am I overcomplicating?
>> 
>
> No, I think that about sums it up. Doing 3, paired with a limit on the
> number of outstanding FDs as in 2 is the full solution. The only
> complication that is that the higher the limit we impose, the more I/O
> and look-up overhead will be incured for every
> guest-file-open/guest-file-close, because those operations will become
> O(fd_limit).

Cure much worse than the disease.

> So if we do it we'll need to impose a reasonable limit like 16k outstanding
> FDs at a time or something (maybe even that's too much, but I'm thinking an
> extra 64k read/write with every fopen()/fclose() isn't that big a deal).

I guess the limit should be in the order of the OS's limit on open fds.

Instead of a purely theoretical limit, we get a real limit.  Stupid.

> But to complicate things somewhat more:
>
> This whole reason for this keystore thing is to patch up the existing
> interfaces so they continue functioning without clients needing to
> update. So if we're considering something that requires imposing a fairly
> small limit on the number of outstanding handles, they'd need to update
> their implementations eventually anyway to be correct.
>
> So I wonder if, rather than pursuing option 3, we just introduce an
> interface that does what we really want and returns handles as UUIDs,
> then mark the existing interfaces as deprecated (and then remove them
> within the next 300 years so our assert never gets hit :)

Looks like you guys have no *practical* problems to solve.  Congrats!
Take a vacation!  Please report back no later than 275 years from now,
to make sure this 64 bit fd counter overflow problem gets taken care of
in time.  ;-P



reply via email to

[Prev in Thread] Current Thread [Next in Thread]