chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] how to speed up my text filter?


From: Xin Zheng
Subject: Re: [Chicken-users] how to speed up my text filter?
Date: Wed, 25 Sep 2013 09:48:00 -0400

Thank you guys for your help. I learned a lot. One more question is -- as to split a line with many fields, would it be more efficient if there were another "string-split" which outputs a vector?

Thanks,
Zin


On Wed, Sep 25, 2013 at 3:20 AM, Peter Bex <address@hidden> wrote:
On Tue, Sep 24, 2013 at 09:39:04PM -0700, Kevin Wortman wrote:
> It looks like the only part of the (process ...) procedure that does any
> significant allocation is the
> (string-split line "\t")
> _expression_, which will allocate a string object and linked list node for
> for every token. However you don't really need those objects; all you
> need is to see whether the ref field starts with * and in that case
> print the id field and nothing else.

Also if the lines are very very long, you may want to avoid splitting
it into several substrings beforehand and keeping them around.  Instead
you could search for the next #\tab occurrance using string-index, and
keep around the previous position, extracting only the substring
currently under scrutiny.  I think srfi-13's kmp-search stuff is
intended for exactly this, but I never was able to grok how to use it.

Anyway, the next advice of using irregexes is much better and you
probably want to do it that way, anyway.

> It may be faster to use a regular _expression_ (
> http://wiki.call-cc.org/man/4/Unit%20irregex ) to search for a matching
> id field, and if and only if a match is found, retrieve the matching id
> field with irregex-match-substring.

Yeah, that's probably better anyway.  Note that if you want to extract
submatches, you may get (much) better performance with chicken master,
or one of the dev snapshots, as it includes the latest upstream irregex
which had major improvements in exactly the submatch extraction part.

irregex also has a way of folding over submatches, which is very nice
if you need to extract more than one field.

> Also note that I think you can simplify your main loop down to:
>
> (for-each-line process in)
> (write-line "done" (current-error-port))

This is a nice improvement.

Cheers,
Peter
--
http://www.more-magic.net

_______________________________________________
Chicken-users mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/chicken-users


reply via email to

[Prev in Thread] Current Thread [Next in Thread]