[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gsl] array, vector indices out-of-bounds in "linalg/bidiag.c"
From: |
Julian Seward |
Subject: |
Re: [Bug-gsl] array, vector indices out-of-bounds in "linalg/bidiag.c" |
Date: |
Tue, 22 Apr 2008 11:50:15 +0200 |
User-agent: |
KMail/1.9.5 |
> On Sun, Apr 20, 2008 at 3:58 AM, Julian Seward <address@hidden> wrote:
> > > for (j = N; j > 0 && j--;)
> On Monday 21 April 2008 03:22, Andrew W. Steiner wrote:
> In case it helps, see
> http://www.gnu.org/software/gsl/design/gsl-design.html#SEC31
> for why the loop is constructed that way.
Ok. So it's clear that is what the original authors intended.
And I fully sympathise with wanting to avoid unsigned integer
wraparound problems in such loops.
However, regarding
http://www.gnu.org/software/gsl/design/gsl-design.html#SEC31
not only is the clever version
for (i = N; i > 0 && i--;) { ... }
confusing, it also generates worse code than the simple
version:
for (i = 0; i < N; i++) { j = N - i; ... }
For the following test program:
void clever ( double* a, unsigned long N ) {
unsigned long i;
for (i = N; i > 0 && i--;) {
a[i] = 0.0;
}
}
void simple ( double* a, unsigned long N ) {
unsigned long i, j;
for (i = 0; i < N; i++) {
j = N - i;
a[j] = 0.0;
}
}
when compiled on x86_64 with gcc-4.1.2 -O2, the inner loop
of "clever" is:
.L10:
subq $1, %rax
subq $8, %rdx
cmpq $-1, %rax
je .L7
.L5:
testq %rax, %rax
movq $0, -8(%rdx)
jne .L10
(7 instructions, 2 conditional branches) as opposed to this for
"straightforward":
.L14:
addq $1, %rdx
movq $0, (%rax)
subq $8, %rax
cmpq %rdx, %rsi
jne .L14
(5 instructions, 1 conditional branch). Although to be fair it
does require 3 integer registers whereas "simple" version only
requires 2, which might have been relevant in the heyday of x86,
but less so now.
If you kick gcc enough to get it to unroll the loops
(gcc -O3 -funroll-loops) the differences are larger.
Then "straightforward" produces this:
.L55:
addq $8, %rcx
movq $0, (%rax)
movq $0, -8(%rax)
movq $0, -16(%rax)
movq $0, -24(%rax)
movq $0, -32(%rax)
movq $0, -40(%rax)
movq $0, -48(%rax)
movq $0, -56(%rax)
subq $64, %rax
cmpq %rcx, %rsi
jne .L55
(nice! one conditional branch per 8 array elements processed)
whereas "clever" still requires one conditional branch per element,
even after unrolling:
.L5:
testq %rax, %rax
movq $0, -8(%rdx)
je .L7
testq %rax, %rax
leaq -8(%rdx), %rcx
je .L7
cmpq $1, %rax
movq $0, -8(%rcx)
leaq -16(%rdx), %rcx
je .L7
cmpq $2, %rax
movq $0, -8(%rcx)
leaq -24(%rdx), %rcx
je .L7
cmpq $3, %rax
movq $0, -8(%rcx)
leaq -32(%rdx), %rcx
je .L7
cmpq $4, %rax
movq $0, -8(%rcx)
leaq -40(%rdx), %rcx
je .L7
cmpq $5, %rax
movq $0, -8(%rcx)
leaq -48(%rdx), %rcx
je .L7
cmpq $6, %rax
movq $0, -8(%rcx)
leaq -56(%rdx), %rcx
je .L7
subq $8, %rax
subq $64, %rdx
movq $0, -8(%rcx)
cmpq $-1, %rax
jne .L5
which speaks for itself.
At a guess I'd also say that the extra test in the clever version
likely makes it unvectorizable. To vectorize a loop effectively,
compilers need to be able to compute a trip count for the loop
before the loop starts. Which is trivial for the simple version
(trip count = N) but less than obvious for the clever version.
J
- [Bug-gsl] array, vector indices out-of-bounds in "linalg/bidiag.c", Kort Travis, 2008/04/20
- Re: [Bug-gsl] array, vector indices out-of-bounds in "linalg/bidiag.c", Julian Seward, 2008/04/20
- Re: [Bug-gsl] array, vector indices out-of-bounds in "linalg/bidiag.c", Andrew W. Steiner, 2008/04/20
- Re: [Bug-gsl] array, vector indices out-of-bounds in "linalg/bidiag.c",
Julian Seward <=
- Re: [Bug-gsl] array, vector indices out-of-bounds in "linalg/bidiag.c", Brian Gough, 2008/04/23
- Re: [Bug-gsl] array, vector indices out-of-bounds in "linalg/bidiag.c", Gerard Jungman, 2008/04/23
- Re: [Bug-gsl] array, vector indices out-of-bounds in "linalg/bidiag.c", Julian Seward, 2008/04/24