bug#22239: New Project

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22239: New Project

From:	Aaron Zhou Qian
Subject:	bug#22239: New Project
Date:	Sat, 7 Jan 2017 22:04:00 -0800

On Sat, Jan 7, 2017 at 9:29 AM, Paul Eggert <address@hidden> wrote:
>
> Could you remind us about the latest status of your proposal compared to
> Zev Weiss's? Does <http://bugs.gnu.org/24689> contain the latest thing
> you have? Zev Weiss's latest version is at <https://github.com/zevweiss/g
> rep>. Comparing the two was the thing Jim Meyering asked for at <
> http://bugs.gnu.org/22239#8>, and you can follow up by sending email to
> address@hidden

Yes that github link is the latest version. I haven't made any changes to
that since last year September.

Basically the main thread traverses the file tree and assign the file to be
searched to each thread. There is also a dynamic buffer so that the output
is identical to the original grep program.

I tested the program on a server. On a directory containing 4 files, grep -r
on that directory is 4 times faster. On a directory containing 8 files, grep -r
is 6 times faster. On a directory containing 12 files, grep -r is 8.5 times
faster.

I think using multithreading is essentially different from not using
multithreading, and we also don't use multithreading all the time for grep.
When we're not using multithreading, i.e. when we pass in other options for
grep, more functions  would call those functions whose function signatures
we changed. This is hard to keep track of, because the program is fairly
complicated.

If we had overloading in C++ I would overload those functions. But since we
don't, I made it very clear in the code which functions are the
counterparts of the original versions. I did this to contain any potential
problems so that if there are any problems with multithreading it would not
affect the sequential program, whereas if we interleave the two scenarios
we might lose track of what's going on. At least this is what I initially
thought.

I saw that there were some recent commits by Zev together with Jim, for
example:

in

commit 9365ed6536d4fabf42ec17fef1bbe5d78884f950

* src/grep.c (compile_fp_t): Now returns an opaque pointer (the
compiled pattern).
(execute_fp_t): Now passed the pointer returned by a compile_fp_t.
All call sites updated accordingly.
(compiled_pattern): New static variable.
* src/dfasearch.c (GEAcompile): Return a void pointer (dummy NULL).
(EGexecute): Receive a void pointer argument (unused).
* src/kwsearch.c (Fcompile): Return a void pointer (dummy NULL).
(Fexecute): Receive a void pointer argument (unused).
* src/pcresearch.c (Pcompile): Return a void pointer (dummy NULL).
(Pexecute): Receive a void pointer argument (unused).
* src/search.h: Update compile/execute function prototypes.

So we have different approaches. They are trying to add extra pointer
arguments for the multithreading case. The pointer argument would be
NULL in the case multithreading is not in effect.  Whereas my approach
is to replicate the functions so the counterparts of the original
functions are used in the multithreading scenario. This was done in an
attempt to reduce the complexity of each of the functions and make the
program less monolithic. I leave you guys to decide.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#22239: New Project, Aaron Zhou Qian <=
- bug#24689: Fwd: bug#22239: New Project, Paul Eggert, 2017/01/09

Prev by Date: bug#25336:
Next by Date: bug#24689: Fwd: bug#22239: New Project
Previous by thread: bug#25336:
Next by thread: bug#24689: Fwd: bug#22239: New Project
Index(es):
- Date
- Thread