Re: Changesets: Re: Parallel access to data created with Octave

octave-maintainers
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Changesets: Re: Parallel access to data created with Octave

From:	Jaroslav Hajek
Subject:	Re: Changesets: Re: Parallel access to data created with Octave
Date:	Wed, 26 May 2010 08:55:46 +0200
On Tue, May 25, 2010 at 9:05 PM, Jarno Rajahalme
<address@hidden> wrote:
>
> On May 25, 2010, at 12:43 AM, ext Jaroslav Hajek wrote:
>
>> On Tue, May 25, 2010 at 6:16 AM, Jarno Rajahalme
>> <address@hidden> wrote:
>>>
>>> liboctave:
>>>
>>> - Followed the Array.h practice of returning const ref from dims() 
>>> (Sparse.h)
>>
>> I'm not sure this is a good idea. It is a bit redundant to store a
>> dim_vector in Sparse, because it can only have 2 dimensions, and we
>> may want to optimize the storage in the future.
>>
>
> Please note that I only changed the existing Sparse::dims () to return a 
> const reference instead of the full object. I did not add any storage 
> requirement anywhere! Also, the modified Sparse::dims () can still be used 
> for copying the object itself, but now the copy is done after the dims() 
> returns, not before, so the caller has a choice. I.e.:
>
> // Create a local copy (implies reference counting, copy-on-write, etc.)
>
>  dim_vector dv = dims();
>
> // Create a const reference (no reference counting, cannot be modified)
>
>  const dim_vector& dv = dims();
>
> Without this change the latter use will ALSO create a temporary copy, so it 
> has the same overhead and thread-unsafeness as the former one.

The storage requirement is implied. Suppose we decide, instead of
storing a dim_vector in Sparse<T>, to store just the row dimension in
order to conserve space (column dim can be inquired from SparseRep).
What will dims() return a reference to? It will need to be changed
back.

>
> Again, this exact change has been made in Array.h earlier, there is even a 
> comment about it :-)
>

Yes, but still it was disputable. dim_vector was created for Array, so
it will probably always be stored directly. In any case, I made this
change relatively recently, and I may still change my mind about it.

>
>>
>>> - Added ov_ndims(), const methods declared as such (dim_vector.h)
>>
>> What for? I don't understand what it does. The number of dimensions
>> are queried by dim_vector::length. Trailing singletons should be
>> removed on construction where suitable (for instance, Array always
>> does it). dim_vector::length is always >= 2.
>> Maybe you got confused by the existing octave_base_value::ndims
>> fallback. It makes no sense in current sources, I cleaned it up.
>>
>
> OK. I just noted that the dims as used by the interpreter expects trailing 
> singletons to be removed. The ndims () code in ov-base.cc was actually used 
> for all the ov derived types, so it was not a fallback only. Glad to hear 
> that in practice there will not be any trailing singletons, though :-)
>
>>> - Changed idx_vector::orig_rows() and idx_vector::orig_columns() to call 
>>> new internal rep virtual functions (idx-vector.h)
>>
>> Hmm, no problem, but these methods aren't used anywhere anyway. Maybe
>> it's better to just delete them.
>
> I changed the relevant octave_value derived classes to use these. That is the 
> only way to assure they are thread safe at this moment. So I'd suggest not 
> deleting these.
>
>>
>>> - Changed dim_vector local variable to const ref when applicable (Array.cc, 
>>> CNDArray.cc, CSparse.cc, Sparse.cc, dNDArray.cc, dSparse.cc, fCNDArray.cc, 
>>> fNDArray.cc, idx-vector.cc, lo-specfun.cc, mx-inlines.cc)
>>
>> This is something I don't really like much. The practice of
>> return-by-value using reference-counted classes is ubiquitous
>> throughout octave sources, and I'm used to it a lot. I don't think I
>> want to relinquish this convenient coding style because of partial
>> thread safety. This will also make things harder to modify if the
>> dim_vector is constructed rather than stored.
>>
>
> Please note that Array.h already defined dims () as returning a const 
> reference, so I only changed the code to use it. The issue you mention about 
> constructing the dim_vector is exactly why code using dims () of the 
> octave_base_value cannot be treated this way. The changes referred to above 
> pertain only to use of dims () of classes derived from Array.h, which always 
> has a stored dim_vector.
>
> Maybe changing more code to use dims () directly (e.g. dims ().length () ), 
> i.e. without using a local copy would be better in cases where the dv is 
> actually used only once or twice.

Unless it's a performance critical place, people should use whatever
approach seems more convenient.

>
>
>>>
>>> src:
>>>
>>> - Made rows() and columns() virtual, changed default implementations of 
>>> ndims, rows, columns, numel, and capacity to be thread-safe, subclasses 
>>> need to override. (ov-base.h)
>>
>> I have no problem with overriding numel () and capacity () (though
>> that is very seldom used) in subclasses, likewise I'm OK with making
>> rows() and columns() virtual (although I think there's little need,
>> especially if we simplify the existing implementations), but I insist
>> that the base methods should implement the fallback computations.
>> Requiring all derived classes to override five methods instead of just
>> one is a bad idea.
>
> I have nothing against returning the fallback computations back to the base 
> implementation. My point was just that with the current practice of some 
> derived types constructing the dim_vector for dims (), it is not simple to 
> make the fallback computations thread safe. However, I'd like to keep the 
> overridden implementations at least on the numeric subclasses, so that they 
> are thread-safe.
>
>>
>>> - Added thread-safe (virtual) ndims(), numel(), capacity(), rows() and 
>>> columns() (ov-base-diag.h, ov-base-mat.h, ov-base-scalar.h, 
>>> ov-base-sparse.h, ov-class.h, ov-cs-list.h, ov-fcn-handle.h, ov-lazy-idx.h, 
>>> ov-perm.h, ov-range.h, ov-struct.h)
>>
>> See the comment above. I think it makes sense to override, say,
>> ndims() for scalars for performance reasons, but I think that
>> explicitly implementing octave_fcn_handle::ndims is a plain nonsense.
>
> I believe the octave_base_value::ndims() is good for octave_fcn_handle as is, 
> it is not overridden, so why complain ;-) But I get your point regarding 
> octave_fcn_handle::rows etc.
>
>>
>>> - Followed the Array.h practice of returning const ref from dims() 
>>> (oct-map.h)
>>
>> OK
>>
>>> - Changed dim_vector local variable to const ref when available 
>>> (bitfcns.cc, data.cc, oct-map.cc, ov-base-diag.cc, ov-base-int.cc, 
>>> ov-base-mat.cc, ov-base-sparse.cc, ov-bool-mat.cc, ov-bool-sparse.cc, 
>>> ov-cell.cc, ov-cx-mat.cc, ov-cx-sparse.cc, ov-flt-cx-mat.cc, 
>>> ov-flt-re-mat.cc, ov-intx.h, ov-perm.cc, ov-re-mat.cc, ov-re-sparse.cc, 
>>> ov-str-mat.cc, ov-struct.cc, pr-output.cc, pt-mat.cc, strfns.cc, xpow.cc)
>>
>> See above. The return-by-value style is idiomatic and convenient
>> (because it doesn't matter whether the function returns a value or a
>> reference), even if we make this change now, there's no guarantee
>> people will keep using this style. I probably won't.
>>
>
> First of all, the reference idiom also works regardless whether the function 
> returns a value or a reference, and it has no penalty in either case.

No, it doesn't. It's OK if you return a const ref and assign it to a
variable (because you just assign the referenced object) and it's OK
if you return an object and make a const reference to it (because C++
holds the temporary for the duration of the const reference, but some
people don't know this). But, whenever a reference is returned from a
function (or, for that matter, passed to it), it is solely the
programmer's responsibility to ensure that the reference remains valid
as long as it is used. The compiler won't guarantee this because it
has no way to do so. Consider this snippet:

const dim_vector& orig_dims = matrix.dims (); // matrix is Array<T> subclass
...
matrix.resize (new_dims); // resize to new dimensions
...
matrix.resize (orig_dims); // gotcha!

the problem is that at the second line, the dim_vector object
referenced by orig_dims silently changes, so the final resize will do
something unexpected (do nothing).

Another simple instance is (disregarding the fact that it makes little sense):

octave_value val = ...
const dim_vector& dv = val.array_value ().dims ();

here, val.array_value () creates a temporary NDArray object, whose
dims () method is the called. This method returns a reference, so no
temporary is withheld. C++ has no way of knowing that it needs to hold
the intermediate expressions, too. It can only hold the final one. So
after this statement, the temporary NDArray object is destroyed, but
the reference still points to the temporary object. Ouch.

Yes, both examples seem somewhat artificial and probably can be
avoided easily, but there may be less obvious cases. And with the
other reference counted classes, it may be even easier to shoot
yourself in the foot.


> So, people don't need to, the code works either way. I did change all the 
> applicable cases, though, so as long as people cut & paste within the 
> codeset, the practice might as well stick, who knows. Anyway, it IS more 
> efficient and safe.

More efficient, yes (at least usually). Safe, no. I have no problem
with using this, especially where performance seems critical, but I'm
against making the changes globally.

"usually" is exact because a reference is ultimately just a pointer
that is usually optimized away; usually, but not always. It can only
be optimized away when the compiler can determine that the
const-referenced object does not change (by means other than the const
reference). The compiler has to assume the worst here. Fortran is
better(?) in this regard because it simply declares all such programs
illegal and so the compiler needs not care :)

If the caller tries to modify the dv, the const ref has to be changed
to a non-const local instance and everything is OK.
>
>>> - Changed dim_vector local variable to const when possible (data.cc, 
>>> gl-render.cc, graphics.cc, ls-mat5.cc, mex.cc, ov-base.cc, ov.cc, pt-mat.cc)
>>
>> Yes, this is always a good idea.
>>
>>> - Changed to avoid calling dims() or numel() via the virtual members 
>>> (ov-base-diag.cc, ov-base-mat.cc, ov-base-sparse.cc, ov-bool-mat.cc, 
>>> ov-bool-mat.h, ov-bool-sparse.cc, ov-cell.cc, ov-cx-mat.cc, 
>>> ov-cx-sparse.cc, ov-flt-cx-mat.cc, ov-flt-re-mat.cc, ov-intx.h, ov-perm.cc, 
>>> ov-range.cc, ov-re-mat.cc, ov-re-sparse.cc, ov-str-mat.cc, ov-struct.cc)
>>>
>>
>> Except in supposedly performance-critical places, I'd rather avoid
>> doing this. Pieces of code are copied often between subclass
>> implementations, and this change makes it harder.
>
> In my (very recent) experience with octave code, most of the subclasses use 
> the "matrix" member, so there should be a problem. Also, I only followed the 
> practice already used in parts of the code.

I don't think so. I used to change the virtual calls to direct calls
in certain places where I thought it could help, but there's a number
of, say, octave_matrix:: methods that just call dims() or numel ().
Check for yourself.

>
> Let me elaborate the reasoning here:
>
> - Classes derived from Array have access to a const reference to the 
> dim_vector via Array::dims (). The subclasses can use the dims () method for 
> accessing the dim_vector without any performance penalty, and in a thread 
> safe manner. All use of dims () can be inlined, etc.

Yes, this was the reason why I changed Array<T>::dims. Originally it
returned a value.

> - Classes derived from octave_base have dims () method that returns 
> dim_vector object, that in some cases is newly constructed, and always 
> copied, involving (in many cases unnecessary) reference counting. This dims 
> () being virtual, it can never be inlined, as the compiler will not know at 
> the compile time which implementation will be used.
>
> So the latter has performance penalties via:
> 1) dim_vector constructor/destructor
> 2) dim_vector::rep reference counting
> 3) no inlining
> 4) mandatory dispatch via vtable, many times canceling the benefit of branch 
> prediction, which can double the execution time
> 5) not thread safe
>

Yes, but you can't avoid the virtual methods normally, and I suppose
it more or less drowns the reference-counting penalty. Maybe not, but
then it needs to be demonstrated. Also, the virtual dims() method call
is typically followed by other virtual method calls (such as
matrix_value) so it doesn't normally matter - Amdahl's law. The
cellfun calls were a notable exception, but significant changes for
purpose of speed-ups should be tested, measured, then discussed.

>
>>
>>>
>>> With these changes the octave_(base)_value functions ndims, rows, columns, 
>>> numel and capacity are now thread safe. My own OpenMP code used to crash 
>>> calling numel, due to the unprotected reference counting, but does not 
>>> crash any more, as these now do not need to touch any reference counters.
>>
>> Although parts of this work are useful per se, it feels more like a
>> workaround than a solution. What you really need is a thread-safe
>> reference-counting.
>
> Thread-safe reference counting would be nice, but that creates huge 
> performance penalties. With a coarse grained lock all threads will stall, and 
> with fine-grained locking the locks will take more storage. Also the 
> lock/unlock will take time as well.
>
> So, I prefer implementing thread-safe reference counting, but also avoiding 
> reference counting when it is not necessary. The changes I have proposed do 
> exactly that, providing a base level of thread safeness for code that 
> inspects (reads) octave data structures without modifying them.

I think you've gone much farther than suitable with your changes. I'm
against implementing octave_fcn_handle::rows et al just for the sake
of preventing octave_value::numel doing reference counting behind the
hood. As I said, I'm OK having the overrides for scalars and matrices,
but not everywhere.

>
>>
>>>
>>> These, and dependent functions are now also marginally faster. However, 
>>> when timing these I noticed some variation due to the changed branch 
>>> prediction behavior, are the patterns of calling virtual functions have 
>>> changed. From this viewpoint it would be nice if all numeric data types had 
>>> the real dim_vector, so that they could share the same implementation for 
>>> all these functions. This would require a common base class for numeric 
>>> types. Also, the I had some earlier crashes related to the initialization 
>>> of static members in threaded code, I believe, so this would take some more 
>>> testing.
>>>
>>
>> I think most of the speed-ups can be achieved by much simpler changes,
>> mostly cleaning up old code. See below.
>>
>
> I agree here.
>
>>> dims() itself cannot be made thread safe without protecting the reference 
>>> counting, as for some types the dim_vector is created dynamically.
>>
>> Yes. The point is that dim_vector is just one class that is reference
>> counted, there are many more - idx_vector, Array, Sparse, SparseQR,
>> octave_shlib, octave_mutex, octave_value, graphics properties ...
>>
>
> There is a subtle, but important difference: none of the above are needed for 
> just *reading* the octave data, while in practice either numel(), columns(), 
> rows() or capacity() is needed to know how much data there is.

No, it isn't. Just like you reference the octave_base_matrix::matrix
members directly through pointer tricks, you can reference
octave_base_matrix::matrix::dims (), and it's thread-safe (thanks to
the const-ref being returned). octave_value::numel() is *not* a safe
way to determine what polymorphic instance you have, and never will
be. Admittedly, if Array::dims didn't return a const ref, you would
have no thread-safe way to get there, so it's good this has changed,
but I think it's not needed to go much farther.

> If these are implemented via octave_base_value::dims (), reference counting 
> gets involved. Also, dim_vector is used by most of the classes you refer to, 
> so there arguably are more reference counting actions being taken at the 
> dim_vector level, than on the level of the classes using it.

Yes. Mostly it's unavoidable, however, because the classes can't
control each other's lifetime, so they can't store just references.
The reference counting in dim_vector avoids sharing data for array
operations that do not change dimensionality, such as mapping or type
conversions.


>
> Please do not get me wrong: I'm all for thread-safe reference counting in 
> Octave. That would allow also multi-threaded creation of Octave data types, 
> which would be awesome indeed. But on the other hand I would like Octave to 
> be as fast as possible, and avoid the (thread-safe) reference counting 
> overheads when just reading the data (within a function that is known not to 
> release control so that the data in question could be modified or deleted).
>
>
>>> There are minor efficiency increases all over the place, due to not 
>>> creating a local instance of dim_vector, where it is not needed.
>>
>> The difference between const dim_vector and const dim_vector& is just
>> the counter increment/decrement, and I doubt that is visible in most
>> places.
>>
>
> Right. I totally agree that when printing, saving, etc. the overhead is 
> totally absorbed by orders of magnitude bigger overheads. But maybe keeping 
> the changes I proposed would entice others later using the same idioms, when 
> creating new code that might be more performance critical (cut & paste)?

Cf. above arguments against adopting this as idiomatic. It's OK if it
feels natural to you and you can avoid the subtle problems (I suppose
you can easily), but otherwise I'm against "fixing" this in others'
code. Also, note that JWE often makes "style fixes" to the code and
he's even much less concerned by performance than me, so if you use a
const reference on purpose for performance reasons, better add an
explanation comment (such as the one for Array<T>::dims) to prevent
someone else from changing it later.

>
>>> Anyway, in the following you will see that the time it takes for cellfun () 
>>> to complete depends on the order of the data.
>>
>>
>> I checked in some simple changes (10651:10654) and tested. What I
>> don't understand about your benchmark code is why you use mean() which
>> has a nontrivial overhead, so I hacked together my own benchmark:
>>
>
> I had used it to make sure each call actually processes the same data. The 
> code being correct it is obviously not needed -- I decided to include the 
> code at the last minute, so I overlooked this.
>
> Btw. I also noticed that cellfun() itself has some overhead. My own 
> "celltest.cc", which copied only the relevant parts of cellfun() was faster.

Hmm. Maybe you omitted something that cellfun needs to do? If you
don't think so, please show me the code. Or if you can figure out how
cellfun can be further sped up (preferably without invasive changes),
that would be even better :)

>
> As I said, my aim was not speed per se, but to allow thread safe access to 
> octave_base_value derived data. With the changes I proposed Octave 
> documentation could list these functions as "thread safe for reading", which 
> is exactly what I need.
>
> Having said that, I would like for Octave to be as fast as possible :-)
>

That's what I have been after for the last couple of years :) But
honestly, regarding performance, there are much more interesting
projects than eliminating superfluous dim_vector reference counting.

I'm open to discussing the individual changes separately, if you split
it to smaller patches, but I'm against applying this as a whole.
(Besides, it probably needs to be refreshed thanks to the cleanups I
made yesterday).

-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Changesets: Re: Parallel access to data created with Octave, (continued)
Prev by Date: Re: Changesets: Re: Parallel access to data created with Octave
Next by Date: Thread-safe reference counting (modified dim-vector.h)
Previous by thread: Re: Changesets: Re: Parallel access to data created with Octave
Next by thread: Re: Changesets: Re: Parallel access to data created with Octave
Index(es):
- Date
- Thread