[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
array-copy! is slow & array-map.c (was: Extremly slow for format & strin
From: |
Daniel Llorens |
Subject: |
array-copy! is slow & array-map.c (was: Extremly slow for format & string-join) |
Date: |
Mon, 1 Apr 2013 19:15:31 +0200 |
> Message: 5
> Date: Mon, 1 Apr 2013 15:40:48 +0800
> From: Daniel Hartwig <address@hidden>
> To: address@hidden
> Subject: Re: Extremly slow for format & string-join
> On 1 April 2013 14:59, Daniel Llorens <address@hidden> wrote:
>> How can it be slower to allocate the result at once?
>
> Shrug. I do not know much of array internals. You probably have much
> more experience there than I.
Not much with the implementation, no :-/
> Except for the curious profile output, I suspect the overhead is due
> to such factors as repeated application of MAPFUNC and consequent
> arithmetic to access the shared arrays contents
mapfunc is used only to compute the strides and bounds, it isn't kept beyond
make-shared-array.
But I hadn't thought that the profile was wrong. Indeed, the slow part is not
make-typed-array but array-copy!.
scheme@(guile-user)> (define s "1234567890")
scheme@(guile-user)> (define t (make-shared-array s (lambda (i j) (list j))
1000000 10))
scheme@(guile-user)> (define a (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> (define b (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> ,time (array-copy! t a)
;; 1.718000s real time, 1.710000s run time. 0.000000s spent in GC.
scheme@(guile-user)> ,time (array-copy! a b)
;; 1.673000s real time, 1.670000s run time. 0.000000s spent in GC.
scheme@(guile-user)>
Since I have no intuition for these numbers, I thought maybe it's really this
slow, or a cache problem, who knows:
scheme@(guile-user)> (import (rnrs bytevectors))
scheme@(guile-user)> (define x (make-bytevector 40000000))
scheme@(guile-user)> ,time (define y (bytevector-copy x))
;; 0.018000s real time, 0.020000s run time. 0.000000s spent in GC.
In NumPy (using doubles):
In [11]: %time a = np.zeros([1000000, 10])
CPU times: user 0.04 s, sys: 0.05 s, total: 0.09 s
Wall time: 0.09 s
In [12]: %time b = a+1
CPU times: user 0.04 s, sys: 0.05 s, total: 0.09 s
Wall time: 0.09 s
So it's really bad. I had a look at libguile/array-map.c. There are three parts
in there:
[1] scm_ramapc(). This is a general array traversal function. It does
linearization, so the (array-copy! a b) call above should resolve to a single
call to racp().
[2] array-copy!, array-fill!, array-map!, array-for-each, etc. These all use
scm_ramapc().
[3] a bunch of specializations scm_ra_sum, scm_ra_difference, and so on.
First, I think that all of [3] should be gone, it's dead code. This is the
first patch.
Second, array-map!, array-for-each cons on each application of proc. The quick
& dirty solution is to add 1-arg, 2-args, etc. cases to ramap(), rafe().
array-index-map! does its own traversal and can't be linearized, so that can't
be fixed as easily. There are weirdo cases. For example array-equal? calls
array_compare that recurses on itself down to the last rank. This means that
there's a function call on each and every array element.
I don't know whether fixing these problems is worthwhile, or the whole thing
should be rewritten, maybe with a different approach. Either go to Scheme where
we have macros and can inline the inner loops, or use a code generator to
generate fixed rank cases, etc.
Third, none of the above are causing the slowness of array-copy!. I noticed
that there's a double indirection in racp(). The second patch removes it.
Actually this double indirection goes on all over array-map.c and I don't
understand why it's needed...
It's only a bit faster than before, though:
scheme@(guile-user)> (define s "1234567890")
scheme@(guile-user)> (define t (make-shared-array s (lambda (i j) (list j))
1000000 10))
scheme@(guile-user)> (define a (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> (define b (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> ,time (array-copy! t a)
;; 1.187000s real time, 1.190000s run time. 0.000000s spent in GC.
scheme@(guile-user)> ,time (array-copy! a b)
;; 1.107000s real time, 1.110000s run time. 0.000000s spent in GC.
scheme@(guile-user)>
There's the overhead of impl->, etc. I'm thinking that one can do a direct
memory copy when the array types are the same, or even call memcpy() when the
strides allow. I think these should be relatively common cases.
Regards,
Daniel
0001-Remove-dead-code-in-array-map.c.patch
Description: Binary data
0002-Remove-double-indirection-in-element-access-in-array.patch
Description: Binary data
- array-copy! is slow & array-map.c (was: Extremly slow for format & string-join),
Daniel Llorens <=
- Re: array-copy! is slow & array-map.c (was: Extremly slow for format & string-join), Daniel Llorens, 2013/04/02
- Re: array-copy! is slow & array-map.c (was: Extremly slow for format & string-join), Daniel Llorens, 2013/04/02
- Re: array-copy! is slow & array-map.c, Ludovic Courtès, 2013/04/03
- Re: array-copy! is slow & array-map.c, Daniel Llorens, 2013/04/03
- Re: array-copy! is slow & array-map.c, Ludovic Courtès, 2013/04/03
- Re: array-copy! is slow & array-map.c, Ludovic Courtès, 2013/04/03
- Re: array-copy! is slow & array-map.c, Daniel Llorens, 2013/04/03
- Re: array-copy! is slow & array-map.c, Ludovic Courtès, 2013/04/03
- Re: array-copy! is slow & array-map.c, Ludovic Courtès, 2013/04/03
- Message not available
- Message not available
- Re: array-copy! is slow & array-map.c, Daniel Llorens, 2013/04/03