classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

String.equals optimisation


From: Simon Kitching
Subject: String.equals optimisation
Date: Tue, 12 Jul 2005 11:02:55 +1200

Hi,

I think the performance of String.equals could be improved.

Currently:
  public boolean equals(Object anObject)
  {
    if (! (anObject instanceof String))
      return false;
    String str2 = (String) anObject;
    if (count != str2.count)
      return false;
    if (value == str2.value && offset == str2.offset)
      return true;
    ...

I would suggest testing for object identity first:
  public boolean equals(Object anObject)
  {
    if (this == anObject)
      return true;

  }

In the case of something like:
  if (someClass.getName().equals("example.MyClass")
the current code does:
  check instanceof
  verify cast
  access count members of both objects [1]
  access value field of both objects
  access offset field of both objects
  return true [2]
when all it actually needs to do is: [3]
  compare references
  return true

[1] affecting CPU cache, and possibly triggering a memory fault
[2] at least the value arrays aren't scanned...
[3] both strings have been interned as described below

Yes, this does assume that comparing two strings with the same identity
occurs reasonably frequently. However I believe this is indeed the case.
Many strings are interned automatically:
* the java spec requires that all literals in classes
  be interned (See javadoc for String.intern and section 3.10.5 of the
  java language spec)
* Class.getName returns strings that have been interned. I don't
  think this is explicitly required by the java specs but is
  certainly true for Sun's JVM and seems likely to be done by
  any sensible JVM.

And strings can be explicitly interned in code via String.intern.

And passing around references to strings then ending up comparing the
string to its original value is fairly common, eg
  if (theValue.equals(Foo.SOME_CONSTANT)) {...}

It would certainly be nice to know that collection methods will
automatically work more efficiently when the objects being manipulated
are String objects that have been interned (of course String.intern has
to be used appropriately).


----

Another possible optimisation would be:
  public boolean equals(Object anObject)
  {
    if (! (anObject instanceof String))
      return false;
    String str2 = (String) anObject;

    // new test
    if ((cachedHashCode != 0) &&
        (str2.cachedHashCode != 0) &&
        (cachedHashCode != str2.cachedHashCode))
      return false;
    ...

For strings that ARE equal this does add up to 4 GETFIELD and 3
comparison ops overhead - but such tests are going to compare every char
in the strings anyway so that isn't such a bad overhead. For strings
that differ within the first 2 characters, this is likely to be a draw;
the overhead of the hashcode comparisons is equal to the loop setup. For
strings that have the same length and prefix but differ towards the
tail, this could be a big win (example: java class names). Note also
that both this and str2 objects are already present in memory. Avoiding
accessing the value array *might* prevent a memory fault depending on
where it is allocated in memory. 

Note that I'm not as keen on this optimisation as the identity-check
proposal. I'm just putting it forward for consideration.

----

I agree that the value of these optimisations does depend upon usage
patterns, and should really be done on the basis of empirical evidence.
If people are interested in this, and such studies haven't been done
before, I might be willing to run some existing performance test suites
with and without these optimisations to see what the actual effects are.
Suggestions as to which performance suites are suitable would be
welcome.

Or maybe you'll tell me that these studies have already been done and
that these optimisations have been proved inappropriate...

Regards,

Simon





reply via email to

[Prev in Thread] Current Thread [Next in Thread]