bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Grep --include does not work


From: Bob Proulx
Subject: Re: Grep --include does not work
Date: Wed, 13 Jun 2007 03:06:25 -0600
User-agent: Mutt/1.5.9i

LeonM wrote:
> Each grep.exe as you are aware contains the word 'help'. I have also
> deliberately modified ./GnuWin32Grep/TestGrep.txt to have a line
> containing the word help. All other text files do not have this.

Seems like a good test case.

> When I run grep in GnuWin32 v2.5.1a without the --include but with the
> '.' as you recommended like this:
>  grep -R -P help .

I did not recommend that.  I recommended this:

  grep -R --include="*.txt" "include" .

> I have got this:
> [...all of the binaries match...]

That is as expected.  Right?

> Now if I run it with
> grep -R --include="*.txt" -P help .
> 
> I have got this:
> ./GnuWin32Grep/TestGrep.txt:    Hello with Tab+space help Perl

That is also as expected, right?  You said that you modified that file
to contian the help word.  It matches.  None of the other files match
the '*.txt' pattern and so were not considered as part of the grep.

So far all looks as expected.  Agreed?

> I am not disputing the fact that with or without --include a much
> larger dataset (file list) is generated. The fact is that all I care
> in the output are those in the text file. Moreover, the --help message
> for --include says this:
> --include=PATTERN     files that match PATTERN will be examined

Yes.

> So my reading tells me that it will generate a long list of files but
> only those that matches this pattern (in my case *.txt) will be
> examined. So without --include, everything will be examined and the
> result seems to agree with this interpretation.

Yes.

> If I replace the end '.' with '*' I get the same result.

Huh?  This is not what you reported previously.  Previously you said
that grep ignored the --include option and searched all files in the
directory tree and printed all matches.  But all of this depends upon
what files are in the current directory for the '*' to be expanded
into when running the command.  This is probably causing confusion.

> So I am confused. Ideally, I should be able to specify this:
> 
> grep -R -P help *.txt

Nope.  That won't do what you are wanting it to do.  "Ideally" has
different ideals depending upon what operating model is desired.  The
above is not ideal for a Unix-like operating model.  If it is a MS
native program with a MS native paradigm then sure.  But on a
Unix-like system it is the shell's job to expand wildcards like
'*.txt' (aka file globs).  Since grep is a Unix program I expect it to
behave Unix-like.  Something different could behave MS-like but then
it would be something different.

> To a Windows program, this is the most logical specification. But it
> does not work as it produces no results.

Grep was developed on the Unix system in 1973 and behaves as expected
on a Unix system.  MS-Windows had not been invented yet.  Grep is not
doing bad for a 34 year old paradigm!

> Do you have a good tutorial site devoted for recursive search?

The find manual is a good place to start.  The find command is the
tool used to "find files".  On a GNU system the documentation will be
available here:

  info find

This is actually a common FAQ and also shows up for other commands
too.  Here is one that talks about 'rm' but also applies to your
question too.  (Full disclosure: I wrote that FAQ entry.)

  
http://www.gnu.org/software/coreutils/faq/#Why-doesn_0027t-rm-_002dr-_002a_002epattern-recurse-like-it-should_003f

In summary the '*.txt' is expanded by the shell into a list of
matching files.  If no files match then a literal *.txt is passed to
the command.  But all it takes is one .txt file in the directory and
then the command shell will replace the *.txt with the file names of
all matching files.

The separation of file wildcard matching into two parts, the shell and
the application program, means that all programs get the same wildcard
matching.  It is outside of any application.  It is part of the
programmable command shell.  All applications then behave uniformly.

On MS the command line shell is not the same command line shell as I
am describing.  File globs (e.g. '*') are not natively expanded by the
command.com shell.  There are limitations on the length of the
arguments passed to the application.  Native MS commands behave
differently.  (More like CPM!)  The commands there must do the file
wildcard expansion themselves.  Therefore ports of GNU command line
applications to MS must adapt in some way.

I don't know the methods that GnuWin32 uses but I assume they have a
library that is linked with the application that simulates the Unix
behavior.  GNU grep does not natively expand file globs and so that
must be added to port this to MS.  This mapping of functionality can
be pretty good but it is still a mapping and there are always corner
cases where things don't work the same as on a GNU/Unix system.

> I have even used the Windows XP "for /R" command and then get grep to
> process file by file in the for command like this:
> for /R . %A in (*.txt) do type %A |grep -P help

Sure.  On a Unix or GNU system this is very similar to the following.
The 'find' command is used to find files.

  find . -name '*.txt' -print0 | xargs -r0 grep help

Here I am using zero terminated strings (the -print0 and -0 options)
and so arbitrary filenames (with spaces, newlines, etc.) are handled.

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]