[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Texmacs-dev] Efficiency of C++ switch-statements
From: |
Gabriel Dos Reis |
Subject: |
Re: [Texmacs-dev] Efficiency of C++ switch-statements |
Date: |
11 Nov 2003 03:34:30 +0100 |
"Dan Martens" <address@hidden> writes:
| --------- Original Message ---------
|
| DATE: Mon, 10 Nov 2003 22:58:15
| From: Joris van der Hoeven <address@hidden>
| To: <address@hidden>
| Cc:
|
| >> Yes, it does use a table.
| >>
|
| This is somewhat misleading.
Yours is at least equally misleading. See below.
| The compiler does not use a real "table" (ie, an array with
| entries), rather the binary search is implemented using compare
| instructions. Ie.
Not always. The compiler (g++ or gcc) can use a table with entries or a
series of compares and jumps, that depends on the density of the labels.
See Message-ID: <address@hidden>
[...]
| So, no jump table is created, just a series of "smart" compare
| instructions to mimic a binary search and narrow decisions. This
| means that any search is O(log n). Much better than cascading if
| statements.
|
| I have made a very small test program:
|
| #include <stdlib.h>
|
| int main(){
|
| int test;
|
| test = rand();
|
| switch(test){
| case 1:
| case 10:
| case 100:
| case 1000:
| case 10000:
| case 100000:
| case 1000000: test++; break;
| case 2:
| case 20:
| case 200:
| case 2000:
| case 20000:
| case 200000:
| case 2000000: test++; break;
| case 3:
| case 30:
| case 300:
| case 3000:
| case 30000:
| case 300000:
| case 3000000: test++; break;
| case 4: test++; break;
| case 40: test++; break;
| default: break;
| }
| }
In your program, the low case is 1 and the highest is 3000000; you
have 24 labels. And they aren't dense in the range 1..3000000. So
according to the comment
A branch table
is used if there are more than a few labels and the labels are dense
within the range between the smallest and largest case value.
GCC will not use a branch table. Rather it will generate a series of
compares and jumps, following
The alternative to the use of a branch table is to generate a series
of compare and jump insns. When that is done, we use the LEFT, RIGHT,
and PARENT fields to hold a binary tree. Initially the tree is
totally unbalanced, with everything on the right. We balance the tree
with nodes on the left having lower case values than the parent
and nodes on the right having higher values. We then output the tree
in order.
which is what you're experimenting.
In short, your testcase does not testify that the compiler does not
generate real branch table tables. It just concurs on the fact that
when the case labels are not dense then, the compiler sort the labels
and generate compares and jumps.
[...]
Now, consider the following:
enum E {
zero, one, two, three, four, five,
six, seven, eight, nine, ten
};
extern E e;
int main()
{
switch(e)
{
case zero:
e = one;
break;
case one:
e = zero;
break;
case two:
e = ten;
break;
case three:
e = zero;
break;
case four:
break;
case five:
e = four;
break;
case six:
e = two;
break;
case seven:
e = one;
break;
case eight:
e = nine;
break;
case nine:
case ten:
break;
}
}
On a Sparc target, here is the output (with some comments) I get
.file "t.C"
.section ".text"
.align 4
.global main
.type main, #function
.proc 04
main:
.LLFB3:
!#PROLOGUE# 0
!#PROLOGUE# 1
sethi %hi(e), %o3
ld [%o3+%lo(e)], %g1
cmp %g1, 8 ! if e is greater than eight
bgu .LL2 ! we're done
sll %g1, 2, %g1
sethi %hi(.LL14), %o5 ! else set the jump table .LL14
or %o5, %lo(.LL14), %o5
ld [%o5+%g1], %o4 ! and load the corresponding entry
jmp %o4 ! jump to that address
nop
.LL5:
mov 10, %g1
b .LL2
st %g1, [%o3+%lo(e)]
.LL10:
mov 1, %g1
b .LL2
st %g1, [%o3+%lo(e)]
.LL6:
b .LL2
st %g0, [%o3+%lo(e)]
.LL11:
mov 9, %g1
b .LL2
st %g1, [%o3+%lo(e)]
.LL9:
mov 2, %g1
b .LL2
st %g1, [%o3+%lo(e)]
.LL8:
mov 4, %g1
st %g1, [%o3+%lo(e)]
.LL2:
retl
mov 0, %o0
.align 4
.subsection -1
.align 4
.LL14:
.word .LL10 ! branch table starts here...
.word .LL6
.word .LL5
.word .LL6
.word .LL2
.word .LL8
.word .LL9
.word .LL10
.word .LL11 ! ...and ends here
.previous
.LLFE3:
.size main, .-main
.ident "GCC: (GNU) 3.4 20030330 (experimental)"
As you can see, the compiler does generate actual lookup table with entries
that correspond to appropriate codes.
| >> > Another question: if I do not put the cases in the same order
| >> > as the enumeration, does this alter the efficiency?
|
| No, and the above explains why. If an array of jump instructions
| were used, then order would matter because the entries of the array
| would be incremental.
Not really. See above.
| >> Also consider using the STL hash_map object, which gives you O(1)
| >> lookups and does not waste space when your table is sparse.
| >
|
| This is way more overhead than a switch statement.
I agree that a hash_map is an overkill approach to this kind of
things. C++ provides for far better and more efficiency way to handle
this kind of things.
| Much more code
| is executed in the map object. You have to remember that a switch
| just does assembly compare instructions. No method calls are made
| (which are expensive), also the cmp instruction is fairly atomic and
| is not a CISC feature (on RISC also), so no microcode is used as is
| the mul and div instructions which are not atomic.
But, a virtual function call may be cheaper than the series of
compares and jumps. And in case the compiler would have used a branch
table, again the virtual function call may be cheaper.
In the end there is a simple, efficient, much more maintainable
solution but nobody will use it because of persistent urban legend
;-).
[...]
| As you can see above, holes in the cases make no difference. A
| switch is by far the most efficent method to use,
compared to a virtual function call, a switch is simpler onnly in a
very limited cases.
-- Gaby
Re: [Texmacs-dev] Efficiency of C++ switch-statements, Dan Martens, 2003/11/11
Re: [Texmacs-dev] Efficiency of C++ switch-statements, Gabriel Dos Reis, 2003/11/11