classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [really patch] Re: HashMap putAll/putAllInternal bug


From: Stuart Ballard
Subject: Re: [really patch] Re: HashMap putAll/putAllInternal bug
Date: Mon, 29 Sep 2003 10:57:40 -0400
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3.1) Gecko/20030527 Debian/1.3.1-2

Bryce McKinlay wrote:
size() is used here because, obviously, it is generally more efficient to call it once rather than calling hasNext() many times. I believe that the current implementation is within spec according to the collections documentation. If your collections are returning an inaccurate size() then I'd argue they are not valid implementations of Map.

Sure: as I noted, my argument is that Sun's implementation can handle such invalid implementations of Map, so people might rely on it, as I did.

I could fix up my implementation of Map to guarantee that size() is correct, but that would make this operation much *slower* than using hasNext() would. To get an accurate size() out of my data structure would require iterating across it fully every time size() is called.

Furthermore, I'd argue that the very existence of a hasNext() method suggests that Sun didn't intend people to make this "optimization". If they expected that calling size() and maintaining your own counter would always be more efficient, why didn't they just leave hasNext() out and recommend coding that way?

Note too that our current implementations leave the collections in an invalid state if next() causes a ConcurrentModificationException (or other RuntimeException) part way through, because they set the size variable in advance without waiting to ensure that all those elements could in fact be added. It would be possible to fix this problem without using hasNext(), and even if my patch isn't accepted I still think we should at least fix that.

Of course, if there are real applications out there that rely on the way Sun implements it, then we may have to change to using hasNext(). But we should consider this carefully. If we must change it, then the addAll/putAll implementations should change throughout the collections classes for consistency - not just HashMap/Hashtable. As you noticed, in some cases this could make things significantly less efficient.

Firstly, there is at least one real application that relies on this :) If I could see a viable workaround, I'd modify nrdo to provide accurate size() information so that Classpath wouldn't need to change (although I still think that it's important to be bug-for-bug compatible with Sun's implementation, so I'd still be in favor of this change even if I wasn't personally relying on it). But in the case of my particular data structure it's impossible to do that without slowing everything down, and that isn't acceptable for my application.

I think it's perfectly possible to implement all our collections to give the correct behavior without making things less efficient in the case where size() is correct - at least, the only difference should be the cost of repeated calls to hasNext() versus a single call to size(), and the cost of hasNext() should *always* be small enough that that difference falls into the realm of micro-optimization.

The example I gave was ArrayList: the optimization is to pre-allocate space for size() worth of elements using ensureCapacity(), and then copy elements directly into the backing array. It would be perfectly possible to implement it something like this, which optimizes for when size() is correct but is robust if it's not:

int csize = c.size();
ensureCapacity(size() + csize);
for (Iterator i = c.iterator(); csize-- > 0 && i.hasNext(); ) {
  // put i.next() directly into the array and increment size,
  // exactly as it's done now.
}
while (i.hasNext()) {
  add(i.next()); // use the standard add method which will
                 // grow the array further if needed.
}

In this version, the for() loop checks i.hasNext() as well as csize so that it's robust against the actual size being smaller than csize. The while() loop uses a slower path for extra elements in case the actual size is larger than csize(). And the only additional cost of this algorithm is extra calls to hasNext(), which should always be cheap.

I don't think there are any ways where size() could be used as an optimization that aren't susceptible to a similar approach - optimize for the case where size() is correct, but still be prepared in case it's not.

Stuart.

--
Stuart Ballard, Senior Web Developer
FASTNET - Web Solutions
(215) 283-2300, ext. 126
www.fast.net





reply via email to

[Prev in Thread] Current Thread [Next in Thread]