bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70511: Option to grep into compressed files


From: David G. Pickett
Subject: bug#70511: Option to grep into compressed files
Date: Tue, 23 Apr 2024 22:51:45 +0000 (UTC)

 Shell scripting can take file names in from a find or ls with 'while read', or 
by globbing 'for f in pattern', and examine them one by one, run 'grep -q' to 
find out if the file or uncompressed stream from that file has a match, and if 
so 'echo' the file name out, or if you want lines, it can 'while read l' the 
stream out of grep to prefix each line with a file name in an 'echo'.  It helps 
to juggle steams not file names, create steams not temp files that have to be 
cleaned up and create delay.  In bash, sometimes while read gets tricky as the 
variable(s) are local to the loop, so sometimes a parenthesis wrapper helps.  
Both ksh and bash also have the nice '<(command)' feature to turn streams of 
stdout into input file names, and '>(command)' for output streams to file 
names.  Bash has so many nice tricks I often google for them, like if recognize 
pattern.  If you do not trust extensions, you can '$(file filename)' to find 
out what you have in hand:
$ echo $(file .profile).profile: ASCII textdgp@dgp-p6803w:~$ 


    On Tuesday, April 23, 2024 at 11:21:26 AM EDT, Mary <marycada@proton.me> 
wrote:  
 
 > Thanks for the suggestion. You're right, this would be better than zgrep
> etc.
> 
> I have some qualms though, as the new option would increase the attack
> surface for 'grep', in that you could then execute arbitrary code by
> passing certain options to 'grep'. Is there some safer way to get what
> you want?


There is still the possibility of including the respective compression 
libraries directly in grep and using the `-Z` and `-J` as proposed, but this 
wouldn't allow to use less popular compression algorithms.

One possibility, but I'm not sure what it's worth, would be to give grep a 
special arg0 to enable shell commands, like `jgrep zcat pattern123 file.gz`. 
But I'm not sure if it's worth the trouble.


> One supposes that if the file extension is not trustworthy, one can taste 
> file like the file command, and use libraries like the gzip libraries to 
> handle gzipped files as a stream.  There are so many others: zip files could 
> be treated like directories and all the files in them that match the glob 
> could be searched, and then there is bzip2, 7zip, ....  It becomes a 
> popularity contest!  One can do all this with shell scripting, and leave poor 
> old grep out of it!


The reason why I wanted to do this in grep directly is because it's difficult 
to implement this with shell scripting. I noticed that neither zgrep, bzgrep 
nor xzgrep support the `-r` option, among others, presumably because it's too 
difficult to implement in a portable way.

I made my patch use a shell command specifically to provide maximum flexibility 
with minimum maintenance cost. But it does open the door to security risks, so 
I understand if it's not worth adding to grep.
  

reply via email to

[Prev in Thread] Current Thread [Next in Thread]