chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] Chicken vs Perl


From: Daishi Kato
Subject: Re: [Chicken-users] Chicken vs Perl
Date: Tue, 20 Sep 2011 22:08:16 +0900
User-agent: Wanderlust/2.14.0 (Africa) Emacs/21.4 Mule/5.0 (SAKAKI)

Hi,

My situation is pretty similar to yours, meaning I used to use Perl
and later started using Chicken for my job.

Running your scripts on my machine produced similar result
(about 10 times difference).

-unsafe option in csc-4.6.0 didn't work (no change).
-unsafe-libraries in csc-4.0.0 did work (a little faster),
but it's not available in csc-4.6.0 (does anybody know why?).

I also tried with csc-4.7.0, and guess what, it's a little slower
(at least on my test data. I partially crawled wiki.call-cc.org).
Peter, how could this happen?

My guess is that read-line is slower than <> in perl.
(I think <> is so optimized in perl.)
This is just my guess and there's no guarantee,
but how about comparing with using read-all in chicken and $/=undef in perl?

Best,
Daishi

At Tue, 20 Sep 2011 14:11:41 +0200,
Sascha Ziemann wrote:
> 
> I tried to use Chicken for a job I would use normally Perl for to find
> out whether Chicken might be a useful alternative.
> 
> The job is: go through a web site mirror and report a unique list of
> all domains from all hrefs.
> 
> This is the my Perl version:
> 
> #! /usr/bin/perl
> 
> use warnings;
> use strict;
> use File::Find;
> 
> my $dir = $ARGV[0] || '.';
> my @files;
> my %urls;
> 
> find ({wanted => sub { push @files, $_ if -f $_; },
>        no_chdir => 1}, $dir);
> 
> foreach my $file (@files) {
>     open (HTML, $file) || die "Can not open file '$file'";
>     while (<HTML>) {
>         while (/href="(http:\/\/[^"\/?]+)(["\/?].*)/i) {
>             $urls{lc $1} = 1;
>             $_ = $2; } }
>     close (HTML); }
> 
> foreach my $url (sort keys %urls) {
>     print $url, "\n"; }
> 
> The Perl version takes for my test tree about two seconds:
> 
> real  0m1.810s
> user  0m1.664s
> sys   0m0.140s
> 
> And this is my Chicken version:
> 
> #! /usr/local/bin/csi -s
> 
> (require-extension posix regex srfi-69)
> 
> (define dir (let ((args (command-line-arguments)))
>               (if (pair? args)
>                   (car args)
>                   ".")))
> (define files (find-files dir regular-file?))
> (define urls (make-hash-table))
> (define href (regexp "href=\"(http://[^\"/?]+)([\"/?].*)" #t))
> 
> (for-each
>  (lambda (filename)
>    (with-input-from-file filename
>      (lambda ()
>        (let next-line ((line (read-line)))
>          (if (not (eof-object? line))
>              (let next-href ((found (string-search href line)))
>                (if found
>                    (begin
>                      (hash-table-set! urls (string-downcase (cadr found)) #t)
>                      (next-href (string-search href (caddr found)))))
>                (next-line (read-line))))))))
>      files)
> 
> (for-each
>  (lambda (arg)
>    (printf "~a\n" arg))
>  (sort (hash-table-keys urls) string<?))
> 
> And now hold on tight! It takes more than one minute for the same data:
> 
> real  1m16.540s
> user  1m14.849s
> sys   0m0.664s
> 
> And there is almost no significant performance boost by compiling it:
> 
> real  0m1.810s
> user  0m1.664s
> sys   0m0.140s
> 
> So the questions are:
> 
> - What is wrong with the Chicken code?
> - How can I profile the code?
> - Why is there no difference between csi and csc?
> 
> _______________________________________________
> Chicken-users mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/chicken-users



reply via email to

[Prev in Thread] Current Thread [Next in Thread]