[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Changesets: Re: Parallel access to data created with Octave
From: |
Jaroslav Hajek |
Subject: |
Re: Changesets: Re: Parallel access to data created with Octave |
Date: |
Tue, 25 May 2010 09:43:19 +0200 |
On Tue, May 25, 2010 at 6:16 AM, Jarno Rajahalme
<address@hidden> wrote:
>
> On May 3, 2010, at 10:33 PM, ext Jaroslav Hajek wrote:
>
>> On Mon, May 3, 2010 at 6:53 PM, Jarno Rajahalme <address@hidden> wrote:
>>
>>> For the case where I test for scalar type, it would suffice if numel() on a
>>> scalar would not depend on a static initialization of a "static dim_vector
>>> dv(1,1); return dv;" within a function dims(). If octave_base_scalar
>>> implemented a numel() always returning 1, I would not have to care whether
>>> a specific cell is a scalar or not. And it would be faster, too :-).
>>>
>>
>> Feel free to do this, but it's just the tip of an iceberg. You simply
>> shouldn't depend on internals like this.
>>
>
>
> I did this (see the attached 2 changesets):
>
>
>
>
> (Please apply both sets, there is minor back and forth between them).
>
> All tests pass (except for the svds, which has one, non-related, fail either
> way).
>
> liboctave:
>
> - Followed the Array.h practice of returning const ref from dims() (Sparse.h)
I'm not sure this is a good idea. It is a bit redundant to store a
dim_vector in Sparse, because it can only have 2 dimensions, and we
may want to optimize the storage in the future.
> - Added ov_ndims(), const methods declared as such (dim_vector.h)
What for? I don't understand what it does. The number of dimensions
are queried by dim_vector::length. Trailing singletons should be
removed on construction where suitable (for instance, Array always
does it). dim_vector::length is always >= 2.
Maybe you got confused by the existing octave_base_value::ndims
fallback. It makes no sense in current sources, I cleaned it up.
> - Changed idx_vector::orig_rows() and idx_vector::orig_columns() to call new
> internal rep virtual functions (idx-vector.h)
Hmm, no problem, but these methods aren't used anywhere anyway. Maybe
it's better to just delete them.
> - Changed dim_vector local variable to const ref when applicable (Array.cc,
> CNDArray.cc, CSparse.cc, Sparse.cc, dNDArray.cc, dSparse.cc, fCNDArray.cc,
> fNDArray.cc, idx-vector.cc, lo-specfun.cc, mx-inlines.cc)
This is something I don't really like much. The practice of
return-by-value using reference-counted classes is ubiquitous
throughout octave sources, and I'm used to it a lot. I don't think I
want to relinquish this convenient coding style because of partial
thread safety. This will also make things harder to modify if the
dim_vector is constructed rather than stored.
>
> src:
>
> - Made rows() and columns() virtual, changed default implementations of
> ndims, rows, columns, numel, and capacity to be thread-safe, subclasses need
> to override. (ov-base.h)
I have no problem with overriding numel () and capacity () (though
that is very seldom used) in subclasses, likewise I'm OK with making
rows() and columns() virtual (although I think there's little need,
especially if we simplify the existing implementations), but I insist
that the base methods should implement the fallback computations.
Requiring all derived classes to override five methods instead of just
one is a bad idea.
> - Added thread-safe (virtual) ndims(), numel(), capacity(), rows() and
> columns() (ov-base-diag.h, ov-base-mat.h, ov-base-scalar.h, ov-base-sparse.h,
> ov-class.h, ov-cs-list.h, ov-fcn-handle.h, ov-lazy-idx.h, ov-perm.h,
> ov-range.h, ov-struct.h)
See the comment above. I think it makes sense to override, say,
ndims() for scalars for performance reasons, but I think that
explicitly implementing octave_fcn_handle::ndims is a plain nonsense.
> - Followed the Array.h practice of returning const ref from dims() (oct-map.h)
OK
> - Changed dim_vector local variable to const ref when available (bitfcns.cc,
> data.cc, oct-map.cc, ov-base-diag.cc, ov-base-int.cc, ov-base-mat.cc,
> ov-base-sparse.cc, ov-bool-mat.cc, ov-bool-sparse.cc, ov-cell.cc,
> ov-cx-mat.cc, ov-cx-sparse.cc, ov-flt-cx-mat.cc, ov-flt-re-mat.cc, ov-intx.h,
> ov-perm.cc, ov-re-mat.cc, ov-re-sparse.cc, ov-str-mat.cc, ov-struct.cc,
> pr-output.cc, pt-mat.cc, strfns.cc, xpow.cc)
See above. The return-by-value style is idiomatic and convenient
(because it doesn't matter whether the function returns a value or a
reference), even if we make this change now, there's no guarantee
people will keep using this style. I probably won't.
> - Changed dim_vector local variable to const when possible (data.cc,
> gl-render.cc, graphics.cc, ls-mat5.cc, mex.cc, ov-base.cc, ov.cc, pt-mat.cc)
Yes, this is always a good idea.
> - Changed to avoid calling dims() or numel() via the virtual members
> (ov-base-diag.cc, ov-base-mat.cc, ov-base-sparse.cc, ov-bool-mat.cc,
> ov-bool-mat.h, ov-bool-sparse.cc, ov-cell.cc, ov-cx-mat.cc, ov-cx-sparse.cc,
> ov-flt-cx-mat.cc, ov-flt-re-mat.cc, ov-intx.h, ov-perm.cc, ov-range.cc,
> ov-re-mat.cc, ov-re-sparse.cc, ov-str-mat.cc, ov-struct.cc)
>
Except in supposedly performance-critical places, I'd rather avoid
doing this. Pieces of code are copied often between subclass
implementations, and this change makes it harder.
>
> With these changes the octave_(base)_value functions ndims, rows, columns,
> numel and capacity are now thread safe. My own OpenMP code used to crash
> calling numel, due to the unprotected reference counting, but does not crash
> any more, as these now do not need to touch any reference counters.
Although parts of this work are useful per se, it feels more like a
workaround than a solution. What you really need is a thread-safe
reference-counting.
>
> These, and dependent functions are now also marginally faster. However, when
> timing these I noticed some variation due to the changed branch prediction
> behavior, are the patterns of calling virtual functions have changed. From
> this viewpoint it would be nice if all numeric data types had the real
> dim_vector, so that they could share the same implementation for all these
> functions. This would require a common base class for numeric types. Also,
> the I had some earlier crashes related to the initialization of static
> members in threaded code, I believe, so this would take some more testing.
>
I think most of the speed-ups can be achieved by much simpler changes,
mostly cleaning up old code. See below.
> dims() itself cannot be made thread safe without protecting the reference
> counting, as for some types the dim_vector is created dynamically.
Yes. The point is that dim_vector is just one class that is reference
counted, there are many more - idx_vector, Array, Sparse, SparseQR,
octave_shlib, octave_mutex, octave_value, graphics properties ...
> There are minor efficiency increases all over the place, due to not creating
> a local instance of dim_vector, where it is not needed.
The difference between const dim_vector and const dim_vector& is just
the counter increment/decrement, and I doubt that is visible in most
places.
> Anyway, in the following you will see that the time it takes for cellfun ()
> to complete depends on the order of the data.
I checked in some simple changes (10651:10654) and tested. What I
don't understand about your benchmark code is why you use mean() which
has a nontrivial overhead, so I hacked together my own benchmark:
N = 1e6;
function ticcellfun (rep, c, fun, type)
tic;
for i = 1:rep
cellfun (fun, c);
endfor
t = toc;
printf ("%s %s: %e\n", type, fun, t / rep);
endfunction
function ticall (c, type)
ticcellfun (20, c, "numel", type);
ticcellfun (20, c, "ndims", type);
ticcellfun (20, c, "length", type);
ticcellfun (20, c, "isempty", type);
endfunction
c = num2cell (rand (N, 1));
ticall (c, "all scalars");
c = num2cell (rand (N, 5), 2);
ticall (c, "all arrays");
c(1:N/2) = num2cell (rand (N/2, 1));
c(N/2+1:N) = num2cell (rand (N/2, 5), 2);
ticall (c, "scalars/arrays");
c(1:2:N) = num2cell (rand (N/2, 1));
c(2:2:N) = num2cell (rand (N/2, 5), 2);
ticall (c, "mixed scalars/arrays");
with previous tip, I get:
all scalars numel: 1.651729e-02
all scalars ndims: 1.707314e-02
all scalars length: 7.021310e-02
all scalars isempty: 1.209819e-02
all arrays numel: 1.612484e-02
all arrays ndims: 1.942660e-02
all arrays length: 6.965815e-02
all arrays isempty: 1.190956e-02
scalars/arrays numel: 1.640530e-02
scalars/arrays ndims: 1.824336e-02
scalars/arrays length: 6.993690e-02
scalars/arrays isempty: 1.205795e-02
mixed scalars/arrays numel: 2.249750e-02
mixed scalars/arrays ndims: 2.329570e-02
mixed scalars/arrays length: 9.565995e-02
mixed scalars/arrays isempty: 1.770819e-02
with the new changes, I get:
all scalars numel: 1.062530e-02
all scalars ndims: 1.069200e-02
all scalars length: 1.968530e-02
all scalars isempty: 6.249142e-03
all arrays numel: 1.658055e-02
all arrays ndims: 1.633325e-02
all arrays length: 2.177579e-02
all arrays isempty: 1.202180e-02
scalars/arrays numel: 1.374600e-02
scalars/arrays ndims: 1.344680e-02
scalars/arrays length: 2.040290e-02
scalars/arrays isempty: 9.098101e-03
mixed scalars/arrays numel: 2.409670e-02
mixed scalars/arrays ndims: 2.170360e-02
mixed scalars/arrays length: 4.485935e-02
mixed scalars/arrays isempty: 1.441931e-02
so especially length() has been improved. But of course, compared to
the "normal" speed of cellfun with a function handle, all cases were
blazingly fast even before. Cellfun is a special example; in most
other cases calls to dims() et al. are parts of larger code bodies and
hence their time is drowned in the total time.
regards
--
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
- Re: Parallel access to data created with Octave, Jaroslav Hajek, 2010/05/03
- Re: Parallel access to data created with Octave, Jarno Rajahalme, 2010/05/03
- Re: Parallel access to data created with Octave, Jaroslav Hajek, 2010/05/04
- Changesets: Re: Parallel access to data created with Octave, Jarno Rajahalme, 2010/05/25
- Re: Changesets: Re: Parallel access to data created with Octave,
Jaroslav Hajek <=
- Re: Changesets: Re: Parallel access to data created with Octave, Michael D. Godfrey, 2010/05/25
- Re: Changesets: Re: Parallel access to data created with Octave, Jaroslav Hajek, 2010/05/25
- Re: Changesets: Re: Parallel access to data created with Octave, Michael D. Godfrey, 2010/05/25
- Re: Changesets: Re: Parallel access to data created with Octave, Jarno Rajahalme, 2010/05/25
- Re: Changesets: Re: Parallel access to data created with Octave, Jarno Rajahalme, 2010/05/25
- Thread-safe reference counting (modified dim-vector.h), Jarno Rajahalme, 2010/05/26
- Re: Thread-safe reference counting (modified dim-vector.h), Jaroslav Hajek, 2010/05/26
- Re: Thread-safe reference counting (modified dim-vector.h), Jarno Rajahalme, 2010/05/26
- Re: Thread-safe reference counting (modified dim-vector.h), Jaroslav Hajek, 2010/05/26
- Re: Changesets: Re: Parallel access to data created with Octave, Jarno Rajahalme, 2010/05/25