Re: Changesets: Re: Parallel access to data created with Octave

octave-maintainers
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Changesets: Re: Parallel access to data created with Octave

From:	Jarno Rajahalme
Subject:	Re: Changesets: Re: Parallel access to data created with Octave
Date:	Tue, 25 May 2010 12:05:02 -0700
On May 25, 2010, at 12:43 AM, ext Jaroslav Hajek wrote:

> On Tue, May 25, 2010 at 6:16 AM, Jarno Rajahalme
> <address@hidden> wrote:
>> 
>> liboctave:
>> 
>> - Followed the Array.h practice of returning const ref from dims() (Sparse.h)
> 
> I'm not sure this is a good idea. It is a bit redundant to store a
> dim_vector in Sparse, because it can only have 2 dimensions, and we
> may want to optimize the storage in the future.
> 

Please note that I only changed the existing Sparse::dims () to return a const 
reference instead of the full object. I did not add any storage requirement 
anywhere! Also, the modified Sparse::dims () can still be used for copying the 
object itself, but now the copy is done after the dims() returns, not before, 
so the caller has a choice. I.e.:

// Create a local copy (implies reference counting, copy-on-write, etc.)

  dim_vector dv = dims();

// Create a const reference (no reference counting, cannot be modified)

  const dim_vector& dv = dims();

Without this change the latter use will ALSO create a temporary copy, so it has 
the same overhead and thread-unsafeness as the former one.

Again, this exact change has been made in Array.h earlier, there is even a 
comment about it :-)


> 
>> - Added ov_ndims(), const methods declared as such (dim_vector.h)
> 
> What for? I don't understand what it does. The number of dimensions
> are queried by dim_vector::length. Trailing singletons should be
> removed on construction where suitable (for instance, Array always
> does it). dim_vector::length is always >= 2.
> Maybe you got confused by the existing octave_base_value::ndims
> fallback. It makes no sense in current sources, I cleaned it up.
> 

OK. I just noted that the dims as used by the interpreter expects trailing 
singletons to be removed. The ndims () code in ov-base.cc was actually used for 
all the ov derived types, so it was not a fallback only. Glad to hear that in 
practice there will not be any trailing singletons, though :-)

>> - Changed idx_vector::orig_rows() and idx_vector::orig_columns() to call new 
>> internal rep virtual functions (idx-vector.h)
> 
> Hmm, no problem, but these methods aren't used anywhere anyway. Maybe
> it's better to just delete them.

I changed the relevant octave_value derived classes to use these. That is the 
only way to assure they are thread safe at this moment. So I'd suggest not 
deleting these.

> 
>> - Changed dim_vector local variable to const ref when applicable (Array.cc, 
>> CNDArray.cc, CSparse.cc, Sparse.cc, dNDArray.cc, dSparse.cc, fCNDArray.cc, 
>> fNDArray.cc, idx-vector.cc, lo-specfun.cc, mx-inlines.cc)
> 
> This is something I don't really like much. The practice of
> return-by-value using reference-counted classes is ubiquitous
> throughout octave sources, and I'm used to it a lot. I don't think I
> want to relinquish this convenient coding style because of partial
> thread safety. This will also make things harder to modify if the
> dim_vector is constructed rather than stored.
> 

Please note that Array.h already defined dims () as returning a const 
reference, so I only changed the code to use it. The issue you mention about 
constructing the dim_vector is exactly why code using dims () of the 
octave_base_value cannot be treated this way. The changes referred to above 
pertain only to use of dims () of classes derived from Array.h, which always 
has a stored dim_vector.

Maybe changing more code to use dims () directly (e.g. dims ().length () ), 
i.e. without using a local copy would be better in cases where the dv is 
actually used only once or twice.


>> 
>> src:
>> 
>> - Made rows() and columns() virtual, changed default implementations of 
>> ndims, rows, columns, numel, and capacity to be thread-safe, subclasses need 
>> to override. (ov-base.h)
> 
> I have no problem with overriding numel () and capacity () (though
> that is very seldom used) in subclasses, likewise I'm OK with making
> rows() and columns() virtual (although I think there's little need,
> especially if we simplify the existing implementations), but I insist
> that the base methods should implement the fallback computations.
> Requiring all derived classes to override five methods instead of just
> one is a bad idea.

I have nothing against returning the fallback computations back to the base 
implementation. My point was just that with the current practice of some 
derived types constructing the dim_vector for dims (), it is not simple to make 
the fallback computations thread safe. However, I'd like to keep the overridden 
implementations at least on the numeric subclasses, so that they are 
thread-safe.

> 
>> - Added thread-safe (virtual) ndims(), numel(), capacity(), rows() and 
>> columns() (ov-base-diag.h, ov-base-mat.h, ov-base-scalar.h, 
>> ov-base-sparse.h, ov-class.h, ov-cs-list.h, ov-fcn-handle.h, ov-lazy-idx.h, 
>> ov-perm.h, ov-range.h, ov-struct.h)
> 
> See the comment above. I think it makes sense to override, say,
> ndims() for scalars for performance reasons, but I think that
> explicitly implementing octave_fcn_handle::ndims is a plain nonsense.

I believe the octave_base_value::ndims() is good for octave_fcn_handle as is, 
it is not overridden, so why complain ;-) But I get your point regarding 
octave_fcn_handle::rows etc. 

> 
>> - Followed the Array.h practice of returning const ref from dims() 
>> (oct-map.h)
> 
> OK
> 
>> - Changed dim_vector local variable to const ref when available (bitfcns.cc, 
>> data.cc, oct-map.cc, ov-base-diag.cc, ov-base-int.cc, ov-base-mat.cc, 
>> ov-base-sparse.cc, ov-bool-mat.cc, ov-bool-sparse.cc, ov-cell.cc, 
>> ov-cx-mat.cc, ov-cx-sparse.cc, ov-flt-cx-mat.cc, ov-flt-re-mat.cc, 
>> ov-intx.h, ov-perm.cc, ov-re-mat.cc, ov-re-sparse.cc, ov-str-mat.cc, 
>> ov-struct.cc, pr-output.cc, pt-mat.cc, strfns.cc, xpow.cc)
> 
> See above. The return-by-value style is idiomatic and convenient
> (because it doesn't matter whether the function returns a value or a
> reference), even if we make this change now, there's no guarantee
> people will keep using this style. I probably won't.
> 

First of all, the reference idiom also works regardless whether the function 
returns a value or a reference, and it has no penalty in either case.
So, people don't need to, the code works either way. I did change all the 
applicable cases, though, so as long as people cut & paste within the codeset, 
the practice might as well stick, who knows. Anyway, it IS more efficient and 
safe. If the caller tries to modify the dv, the const ref has to be changed to 
a non-const local instance and everything is OK.

>> - Changed dim_vector local variable to const when possible (data.cc, 
>> gl-render.cc, graphics.cc, ls-mat5.cc, mex.cc, ov-base.cc, ov.cc, pt-mat.cc)
> 
> Yes, this is always a good idea.
> 
>> - Changed to avoid calling dims() or numel() via the virtual members 
>> (ov-base-diag.cc, ov-base-mat.cc, ov-base-sparse.cc, ov-bool-mat.cc, 
>> ov-bool-mat.h, ov-bool-sparse.cc, ov-cell.cc, ov-cx-mat.cc, ov-cx-sparse.cc, 
>> ov-flt-cx-mat.cc, ov-flt-re-mat.cc, ov-intx.h, ov-perm.cc, ov-range.cc, 
>> ov-re-mat.cc, ov-re-sparse.cc, ov-str-mat.cc, ov-struct.cc)
>> 
> 
> Except in supposedly performance-critical places, I'd rather avoid
> doing this. Pieces of code are copied often between subclass
> implementations, and this change makes it harder.

In my (very recent) experience with octave code, most of the subclasses use the 
"matrix" member, so there should be a problem. Also, I only followed the 
practice already used in parts of the code.

Let me elaborate the reasoning here:

- Classes derived from Array have access to a const reference to the dim_vector 
via Array::dims (). The subclasses can use the dims () method for accessing the 
dim_vector without any performance penalty, and in a thread safe manner. All 
use of dims () can be inlined, etc.
- Classes derived from octave_base have dims () method that returns dim_vector 
object, that in some cases is newly constructed, and always copied, involving 
(in many cases unnecessary) reference counting. This dims () being virtual, it 
can never be inlined, as the compiler will not know at the compile time which 
implementation will be used.

So the latter has performance penalties via:
1) dim_vector constructor/destructor
2) dim_vector::rep reference counting
3) no inlining
4) mandatory dispatch via vtable, many times canceling the benefit of branch 
prediction, which can double the execution time
5) not thread safe


> 
>> 
>> With these changes the octave_(base)_value functions ndims, rows, columns, 
>> numel and capacity are now thread safe. My own OpenMP code used to crash 
>> calling numel, due to the unprotected reference counting, but does not crash 
>> any more, as these now do not need to touch any reference counters.
> 
> Although parts of this work are useful per se, it feels more like a
> workaround than a solution. What you really need is a thread-safe
> reference-counting.

Thread-safe reference counting would be nice, but that creates huge performance 
penalties. With a coarse grained lock all threads will stall, and with 
fine-grained locking the locks will take more storage. Also the lock/unlock 
will take time as well.

So, I prefer implementing thread-safe reference counting, but also avoiding 
reference counting when it is not necessary. The changes I have proposed do 
exactly that, providing a base level of thread safeness for code that inspects 
(reads) octave data structures without modifying them.

> 
>> 
>> These, and dependent functions are now also marginally faster. However, when 
>> timing these I noticed some variation due to the changed branch prediction 
>> behavior, are the patterns of calling virtual functions have changed. From 
>> this viewpoint it would be nice if all numeric data types had the real 
>> dim_vector, so that they could share the same implementation for all these 
>> functions. This would require a common base class for numeric types. Also, 
>> the I had some earlier crashes related to the initialization of static 
>> members in threaded code, I believe, so this would take some more testing.
>> 
> 
> I think most of the speed-ups can be achieved by much simpler changes,
> mostly cleaning up old code. See below.
> 

I agree here.

>> dims() itself cannot be made thread safe without protecting the reference 
>> counting, as for some types the dim_vector is created dynamically.
> 
> Yes. The point is that dim_vector is just one class that is reference
> counted, there are many more - idx_vector, Array, Sparse, SparseQR,
> octave_shlib, octave_mutex, octave_value, graphics properties ...
> 

There is a subtle, but important difference: none of the above are needed for 
just *reading* the octave data, while in practice either numel(), columns(), 
rows() or capacity() is needed to know how much data there is. If these are 
implemented via octave_base_value::dims (), reference counting gets involved. 
Also, dim_vector is used by most of the classes you refer to, so there arguably 
are more reference counting actions being taken at the dim_vector level, than 
on the level of the classes using it.

Please do not get me wrong: I'm all for thread-safe reference counting in 
Octave. That would allow also multi-threaded creation of Octave data types, 
which would be awesome indeed. But on the other hand I would like Octave to be 
as fast as possible, and avoid the (thread-safe) reference counting overheads 
when just reading the data (within a function that is known not to release 
control so that the data in question could be modified or deleted).


>> There are minor efficiency increases all over the place, due to not creating 
>> a local instance of dim_vector, where it is not needed.
> 
> The difference between const dim_vector and const dim_vector& is just
> the counter increment/decrement, and I doubt that is visible in most
> places.
> 

Right. I totally agree that when printing, saving, etc. the overhead is totally 
absorbed by orders of magnitude bigger overheads. But maybe keeping the changes 
I proposed would entice others later using the same idioms, when creating new 
code that might be more performance critical (cut & paste)?

>> Anyway, in the following you will see that the time it takes for cellfun () 
>> to complete depends on the order of the data.
> 
> 
> I checked in some simple changes (10651:10654) and tested. What I
> don't understand about your benchmark code is why you use mean() which
> has a nontrivial overhead, so I hacked together my own benchmark:
> 

I had used it to make sure each call actually processes the same data. The code 
being correct it is obviously not needed -- I decided to include the code at 
the last minute, so I overlooked this.

Btw. I also noticed that cellfun() itself has some overhead. My own 
"celltest.cc", which copied only the relevant parts of cellfun() was faster.

> [test results omitted]

> so especially length() has been improved. But of course, compared to
> the "normal" speed of cellfun with a function handle, all cases were
> blazingly fast even before. Cellfun is a special example; in most
> other cases calls to dims() et al. are parts of larger code bodies and
> hence their time is drowned in the total time.
> 

As I said, my aim was not speed per se, but to allow thread safe access to 
octave_base_value derived data. With the changes I proposed Octave 
documentation could list these functions as "thread safe for reading", which is 
exactly what I need.

Having said that, I would like for Octave to be as fast as possible :-)

Regards,

  Jarno
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Changesets: Re: Parallel access to data created with Octave, (continued)
Prev by Date: Re: Timeline for 3.4 release?
Next by Date: Re: Changesets: Re: Parallel access to data created with Octave
Previous by thread: Re: Thread-safe reference counting (modified dim-vector.h)
Next by thread: Re: Changesets: Re: Parallel access to data created with Octave
Index(es):
- Date
- Thread