octave-patch-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-patch-tracker] [patch #8426] strings package: speeding up the ed


From: Carnë Draug
Subject: [Octave-patch-tracker] [patch #8426] strings package: speeding up the edit distance
Date: Fri, 08 Aug 2014 13:15:31 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0 Iceweasel/30.0

Follow-up Comment #5, patch #8426 (project octave):

As I mentioned, we don't have at the moment a maintainer of the strings
package and I don't have the expertise to comment on the algorithms being
implemented so I'll just comment on the required Octave conventions and push
this if you can fix them:

* on the copyright, add a new line with your name and this year. DO not add
your name to the previous copyright owner (you are not the copyright owner of
this since 2006)


## Copyright (C) 2006 Muthiah Annamalai <address@hidden> and Max
Görner
## Copyright (C) 2014 Max Görner <address@hidden>


* since there's multiple ways to call this, note them as separate deftypefnx
lines on the header, i.e.,


## @deftypefn  {Function File} address@hidden, @var{L}] =} editdistance
(@var{str1}, @var{str2})
## @deftypefnx {Function File} address@hidden, @var{L}] =} editdistance
(@var{str1}, @var{str2}, @var{weights})
## @deftypefnx {Function File} address@hidden, @var{L}] =} editdistance
(@var{str1}, @var{str2}, @var{weights}, @var{modus})


* the first line of the help text should be a single sentence, since it is
used by functions such as `lookfor`. Use something like "Compute the
Levenshtein edit distance between two strings."

* if weights defaults to [1 0 1], set it on the function signature

* "(nargin >= 3 && length (weights) < 3)" deserves an error message on its
own. Also, do not use length it will return true for a matrix size [2 3].
Finally, shouldn't an empty weigths be allowed? The documentation says "For
the special case that @var{weights} is empty"


function [dist, L] = editdistance (str1, str2, weights = [1 0 1], modus = 0)
  if (nargin < 2 || nargin > 4)
    print_usage ();
  endif

  if (nargin > 2 && (numel (weights) != 3 && ! isempty (weigths))
    error ("editdistance: WEIGTHS must be a 3 element vector.")
  endif
-verbatim

* I can't figure out "(nargin >= 2 || weights(1) == weights(3) && weights(1)
== 1 && weights(2) == 0)" ? This will always return true because this function
is always called with at least 2 arguments. But I remove it, then the rest is
the same as `weigths = [1 0 1]`. What should this be doing? From the
documentation, it would seem it should be:


  saveMemory = nargout < 2;
  if (modus == 0 && saveMemory && isempty (weights)) 
    dist = berghel_roach (str1,str2);
    return
  endif


* while the algorithm seems like it's not really requires a char, it does
require the input to be a vector. You should check for that. For the same
reason mentioned above, do not use length in:


  L1 = length (str1) + 1;
  L2 = length (str2) + 1;



* use more whitespace please. For example, add a space between operators
(including "="), and after commas in a comma separated list

* can you avoid the nested sub-function? I can see that it can modify the
sparse matrix when it's called so this will avoid copying it for write. But
nested functions in Octave and Matlab have weird scoping rules and can lead to
weird bugs later on (and I'm not sure if we're completely Matlab compatible on
that respect). I guess that since is less prone to problems so it should be
fine if you can't find a way that would not impact performance.

* finally, if you clone the repository and submit a changeset, it's easier to
review and you'll keep authorship of the changes:


 hg clone http://hg.code.sf.net/p/octave/strings octave-forge-strings
 cd octave-forge-strings
 ## make your changes
 hg commit -m
 ## enter a commit message following commit message guidelines
http://wiki.octave.org/Commit_message_guidelines
 ## probably something like:
 ##
 ## editdistance: use Berghel and Roach algorithm for performance (patch
#8426)
 ## 
 ## * editdistance.m: blah blah blah multi-line comment with deeper details,
with
 ## each line not taking more than 80 characters.
 hg export -o ../speedup-editdistance.diff tip



    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?8426>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]