qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] qemu-ga: use key-value store to avoid recycling


From: mdroth
Subject: Re: [Qemu-devel] [PATCH] qemu-ga: use key-value store to avoid recycling fd handles after restart
Date: Thu, 21 Mar 2013 11:13:13 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Mar 21, 2013 at 08:03:11AM +0100, Markus Armbruster wrote:
> mdroth <address@hidden> writes:
> 
> > On Wed, Mar 20, 2013 at 02:38:35PM -0400, Luiz Capitulino wrote:
> >> On Wed, 20 Mar 2013 13:14:21 -0500
> >> mdroth <address@hidden> wrote:
> >> 
> >> > > > > > > > +    handle = s->pstate.fd_counter++;
> >> > > > > > > > +    if (s->pstate.fd_counter < 0) {
> >> > > > > > > > +        s->pstate.fd_counter = 0;
> >> > > > > > > > +    }
> >> > > > > > > 
> >> > > > > > > Is this handling the overflow case? Can't fd 0 be in use 
> >> > > > > > > already?
> >> > > > > > 
> >> > > > > > It could, but it's very unlikely that an overflow/counter
> >> > > > > > reset would
> >> > > > > > result in issuing still-in-use handle, since
> >> > > > > > guest-file-open would need
> >> > > > > > to be called 2^63 times in the meantime.
> >> > > > > 
> >> > > > > Agreed, but as you do check for that case and as the right
> >> > > > > fix is simple
> >> > > > > and I think it's worth it. I can send a patch myself.
> >> > > > > 
> >> > > > 
> >> > > > A patch to switch to tracking a list of issued handles in the 
> >> > > > keystore,
> >> > > > or to return an error on overflow?
> >> > > 
> >> > > To return an error on overflow. Minor, but if we do handle it let's do 
> >> > > it
> >> > > right. Or, we could just add an assert like:
> >> > > 
> >> > > assert(s->pstate.fd_counter >= 0);
> >> > 
> >> > Ah, well, I'm not sure I understand then. You mean dropping:
> >> > 
> >> >     if (s->pstate.fd_counter < 0) {
> >> >         s->pstate.fd_counter = 0;
> >> >     }
> >> > 
> >> > And replacing it with an error or assertion?
> >> 
> >> Yes, because I had understood you meant that this is very unlikely to be
> >> triggered because we'd need guest-file-open to be called 2^63. This is
> >> what I agreed above, although thinking more about it there's also the
> >> possibility of a malicious client doing this on purpose.
> >> 
> >> But now I see that what you really meant is that it's unlikely for fd 0
> >> to be in use after 2^63 guest-file-open calls. Am I right? If yes, then
> >
> > Yup, that's the scenario I was referring to.
> >
> >> I disagree because there's no way to guarantee when a certain fd will be
> >> in use or not, unless we allow fds to be returned.
> >
> > Yes, it's a real scenario that can indeed occur under "normal" usage, but in
> > my head it's similar to assumptions we make about clocks not overflowing 
> > within
> > a timeframe that anyone cares about in terms of severity. To quantify it
> > more:
> >
> > Given RPC latency there's really no way to run up the fd counter faster than
> > once per nanosecond (more like millisecond atm), so you'd have to keep a 
> > handle
> > open and constantly called guest-file-open for a period of 292 years
> > before a duplicate handle gets issued.
> 
> In other words, it's a complete non-issue :)
> 
> >> > If so, the overflow is actually expected: once we dish out handle 
> >> > MAX_INT64,
> >> > we should restart at 0. I initially made fd_counter a uint64_t so
> >> > overflow/reset would happen more naturally, but since we issue handles as
> >> > int64_t this would've caused other complications.
> >> > 
> >> > Something like this might be more clear about the intent though:
> >> > 
> >> >     handle = s->pstate.fd_counter;
> >> >     if (s->pstate.fd_counter == MAX_INT64) {
> >> >         s->pstate.fd_counter = 0;
> >> >     } else {
> >> >         s->pstate.fd_counter++;
> >> >     }
> >> 
> >> I disagree about restarting to zero as I have explained above. You seem to
> >> not like returning an error, is it because we'll make guest-file-open
> >> useless after the limit is reached?
> >
> > Yes. But, on second thought, given the above I guess I don't mind returning 
> > an
> > error as a failsafe for now. Although I think it should be an assert()
> > with a FIXME:, since it really should be fixed before anyone ever manages
> > to trigger it.
> >
> >> 
> >> Let's review our options:
> >> 
> >>  1. When fd_count reaches MAX_INT64 we reset it to zero
> >> 
> >>     Pros: simple and guest-file-open always work
> >>     Cons: fd 0 might be in use by a client
> >> 
> >>  2. When fd_count reaches MAX_INT64 we return an error
> >> 
> >>     Pros: simple and we fix 'cons' from item 1
> >>     Cons: guest-file-open will have a usage count limit
> 
>              error path unreachable in practice
> 
> Such errors require special hackery to test.  The test needs to include
> the error's consumer.
> 
> Waste of time if you ask me.
> 
> >>  3. Allow fds to be returned by clients on guest-file-close and do 2 on top
> >> 
> >>     Pros: solve problems discussed in items 1 and 2
> >>     Cons: not trivial and the usage limit problem from item 2 can still
> >>           happen if the client ends up not calling guest-file-close
> >>           (although I do think we'll reach the OS limit here)
> 
> The OS limit on file descriptors is many orders of magnitude lower than
> 2^63.
> 
> >> Do you see other options? Am I overcomplicating?
> >> 
> >
> > No, I think that about sums it up. Doing 3, paired with a limit on the
> > number of outstanding FDs as in 2 is the full solution. The only
> > complication that is that the higher the limit we impose, the more I/O
> > and look-up overhead will be incured for every
> > guest-file-open/guest-file-close, because those operations will become
> > O(fd_limit).
> 
> Cure much worse than the disease.
> 
> > So if we do it we'll need to impose a reasonable limit like 16k outstanding
> > FDs at a time or something (maybe even that's too much, but I'm thinking an
> > extra 64k read/write with every fopen()/fclose() isn't that big a deal).
> 
> I guess the limit should be in the order of the OS's limit on open fds.
> 
> Instead of a purely theoretical limit, we get a real limit.  Stupid.
> 
> > But to complicate things somewhat more:
> >
> > This whole reason for this keystore thing is to patch up the existing
> > interfaces so they continue functioning without clients needing to
> > update. So if we're considering something that requires imposing a fairly
> > small limit on the number of outstanding handles, they'd need to update
> > their implementations eventually anyway to be correct.
> >
> > So I wonder if, rather than pursuing option 3, we just introduce an
> > interface that does what we really want and returns handles as UUIDs,
> > then mark the existing interfaces as deprecated (and then remove them
> > within the next 300 years so our assert never gets hit :)
> 
> Looks like you guys have no *practical* problems to solve.  Congrats!
> Take a vacation!  Please report back no later than 275 years from now,
> to make sure this 64 bit fd counter overflow problem gets taken care of
> in time.  ;-P
> 

Haha, well, I didn't want to be that one lazy developer who brings about
the downfall of future human civilization... but if it's a really big
deal they'll probably send someone back from the future to let me know,
so maybe I'm jumping the gun a bit :)

I just didn't want to introduce a new interface that relied on
interfaces that were planned for deprecation in the *long*-term, but i
think you're right, it's too much hassle for current users for too
little gain, and there's plenty of time to do it in the future so I'll
hold off on it for now.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]