octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fast set operations / verdict


From: Jaroslav Hajek
Subject: Re: fast set operations / verdict
Date: Sat, 9 Aug 2008 22:44:38 +0200

On Sat, Aug 9, 2008 at 10:09 PM, Levente Torok <address@hidden> wrote:
> Hi Jaroslav,
>
> Thanks for the fast repsonse.
>
> I copied out sources from octave.3.0.1 to make this comparision. I suffexed 
> them with number 3 for the experiment.

That's not of much use, I'm afraid. As I've said, the improvements
done to set functions themselves (the .m files) do not probably have
impact on performance. It is the new, improved "sort" function in
Octave 3.1.51 that could help.


> Please note that set complement is much much more expensive than intersection.
> Here are my measurements:
>
> l=10000000;
> octave:7> x= floor(l * rand(l,1)); y=floor(l * rand(l,1)) ; t=time; 
> z=intersection2( x, y ); dt=time -t
> dt = 37.698
> octave:8> x= floor(l * rand(l,1)); y=floor(l * rand(l,1)) ; t=time; 
> z=intersection3( x, y ); dt=time -t
> dt = 90.695
> ratio > 2.4
>
> octave:14> l=50000;x= floor(l * rand(l,1)); y=floor(l * rand(l,1)) ; t=time; 
> z=complement2( x, y ); dt=time -t
> dt = 0.12633
> octave:15> l=50000;x= floor(l * rand(l,1)); y=floor(l * rand(l,1)) ; t=time; 
> z=complement3( x, y ); dt=time -t
> dt = 57.126
>
> The reason why I hassle with is that I need I need to load data from a 
> database with bare columns (dataware house)
> and I need to make the joint of these columns here in octave unfortunately.
>

OK, it seems that `complement' really has problems. I have contributed
the improvements to set functions in 3.1.x, but I have not touched
`complement'. Given the existence of `setdiff', which basically does
the same, and is Matlab-compatible, we shall probably declare
complement obsolete. Use setdiff to get the desired effect.



> Levente
>
> On Saturday 09 August 2008, you wrote:
>> On Sat, Aug 9, 2008 at 4:40 PM, Levente Torok <address@hidden> wrote:
>> > Dear All,
>> >
>> > I was in the need to have a fast set operations such as intersection and 
>> > complement but the
>> > those that I could use with the current octave versions (<3.0) are very 
>> > very slow to my needs.
>>
>> Well, set operations are not a typical bottleneck in computations
>> people employ Octave for, so there's not much demand for improvement.
>> Still, 2.9.x series are quite old - benchmarking should be done
>> against development version (or a recent snapshot at least) to be of
>> any value. There have been improvements to set functions since (though
>> mostly for functionality rather than performance) and to sorting
>> (which may make a significant difference).
>>
>> > For this reason I wrote a version in STL C++.
>>
>> OK, but if I were to do that, I'd simply wrap set_intersection from 
>> <algorithm>.
>>
>> > May be not optimal in terms of strorage (I convert everything to STL 
>> > vector since it has fast iterators)
>> > however this is still 100-500 times faster for vectors with 10^5 elements, 
>> > at least,
>> > than the versions currently supplied by octave. I enclose the sources.
>> >
>>
>> If you have any interest, I can test this against development version
>> if I have the time. 100-500 is too much; if this is still true, we
>> should improve something.
>> Also note that if the set sizes are not asymptotically equal, then
>> your approach is sub-optimal. (Instead of O(N*log(N) + M*log(M)) you
>> can easily get O(N*log(N)+M))
> Sure.
> See the first 10 lines of my sources.
> They solve this problem, I guess, with worst case n log n optimality.

No, you did not understand me. If the sets are comparable in size,
it's OK. But if one is much smaller, it pays off to sort just one of
them and use a binary search rather than sorted merge.

cheers,

>
> Levente
>
>
> --
>
>



-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]