texinfo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Performance profiling of makeinfo


From: Patrice Dumas
Subject: Re: Performance profiling of makeinfo
Date: Fri, 26 Dec 2014 10:27:00 +0100
User-agent: Mutt/1.5.20 (2009-12-10)

On Wed, Dec 24, 2014 at 05:21:55PM +0000, Gavin Smith wrote:
> 
> 37432;5;1;29.9s;57.8s;Texinfo::Parser::_deep_copy
>
> I note a significant amount of time is spent in _deep_copy, which
> takes a dump of a data structure and then reads it back in again with
> "eval". Patrice, do you think it is worth investigating other ways of
> duplicating a data structure? I saw a page at
> http://stackoverflow.com/questions/388187/whats-the-best-way-to-make-a-deep-copy-of-a-data-structure-in-perl
> that suggested using the "Clone" module instead.

I remember that my idea was that it only needed to be done once, so it
was no big deal if it was not that fast.  I checked the code, and indeed
parser() should be called only once.  I am a bit puzzled here.  How
comes it is called so often?

In tp/t/test_utils.pl I also use dclone, but it only works if there is
no in-tree references, if I recall well.

> The other surprise is how long it seems to take just refilling
> paragraphs, in add_text in Paragraph.pm (28% of the total run time).
> The detailed profiling information shows that Perl spends a lot of
> time just in regular expressions.

I am not surprised for this one, as it has to check every character to
see if it is InFullwidth or a space and doing it with regexps may not be
very efficient.

> The only thing that maybe made a small difference was commenting out
> anything to do with "underlying text", which possibly isn't used for
> anything apart from debugging output (I think this is used if the case
> of the text was changed).

Underlying text is always used to determine the text case.  It is,
however, only useful if the text case was changed.  So there could be
some gain to ignore it if the case is not changed, since changing case
is not very common.  Maybe, if not defined, instead of defining it,
$paragraph->{'word'} could be used line 438 in

      } elsif ($paragraph->{'underlying_word'} =~
/[$end_sentence_character][$after_punctuation_characters]*$/
           and $paragraph->{'underlying_word'} !~
/[[:upper:]][$end_sentence_character$after_punctuation_characters]*$/) {

> I figure the best chance to speed up the calls to add_text is to
> rewrite Paragraph.pm in C. This would also avoid the overhead of Perl
> function calls to add_next for each word or gap between words. It
> looks simple enough that the logic won't be that difficult. (Much
> simpler than replacing Parser.pm, anyway.) We could get the screen
> width of characters with the wcwidth function. I haven't made any
> progress yet on rewriting the algorithm in C; I've spent some time
> trying to figure out how to pass information between the main Perl
> program in Plaintext.pm and an XS module.

In that case, there may also be some gain if instead of regexp an
explicit list of spaces characters was used, and, similarly, if instead
of \p{InFullwidth} something more rapid could be used to determine if
the next character is InFullwidth, it could speed things a lot.
Otherwise said, going one character at a time with character classes
instead of regexp could probably save time.

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]