bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25146: grep unusable on mingw - SAME_INODE woes


From: Bruno Haible
Subject: bug#25146: grep unusable on mingw - SAME_INODE woes
Date: Fri, 09 Dec 2016 16:30:45 +0100
User-agent: KMail/4.8.5 (Linux/3.8.0-44-generic; KDE/4.8.5; x86_64; ; )

> grep snapshot:
>   http://meyering.net/grep/grep-ss.tar.xz      1.4 MB
>   http://meyering.net/grep/grep-ss.tar.xz.sig
>   http://meyering.net/grep/grep-2.26.39-ae3f.tar.xz

This release, built for mingw, is hardly usable:
  - 33 out of 107 tests fail,
  - A simple "grep.exe o xx > yy" fails with error
    grep.exe: input file 'xx' is also the output

More details:
- This happens both in a Cygwin mintty.exe window and in a cmd.exe window.
- It's the same for 32-bit mingw builds and 64-bit mingw builds
  (recipe: 
http://git.savannah.gnu.org/gitweb/?p=gperf.git;a=blob_plain;f=README.windows;hb=HEAD
 )
- The error is signalled in grep.c:1874.
  At this point, 'st' (of type 'struct _stat64') contains
    { st_dev = 0, st_ino = 0,
      st_mode = 0x81B6 = _S_IFREG | _S_IREAD | _S_IWRITE | 0x36,
      st_nlink = 1,
      st_uid = 0, st_gid = 0, st_rdev = 0, st_size = 4,
      st_atime = 1481099615, st_mtime = 1481099615, st_ctime = 1481099615 }
  Obviously, such a struct cannot reliably distinguish two different regular 
files.
  In other words, SAME_INODE cannot work.
- So, how do you determine identity of files in Windows?
  
http://stackoverflow.com/questions/562701/best-way-to-determine-if-two-path-reference-to-same-file-in-windows
  But even this is wrong, the use of a BY_HANDLE_FILE_INFORMATION
  is not sufficient because it contains only 64-bit identifiers for
  files. See 
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363788(v=vs.85).aspx
  The best approach is to use GetFileInformationByHandleEx to produce a
  FILE_ID_INFO.

Find attached a proof-of-concept patch. (Really rough - needs
-D_WIN32_WINNT=_WIN32_WINNT_WIN8, and lacks good error handling.)

With it, I get:
$ ./grep.exe o xx > yy
$ ./grep.exe o xx > xx
grep.exe: input file 'xx' is also the output

That is, now the detection of identical regular files works.

How can we go forward from here? I would propose a gnulib module that defines
a data structure that combines a 'struct stat' with the FILE_ID_INFO for native
Windows, and rebase the 'same-inode' module on it.

The other approach, to override mingw's 'struct stat' and stat/fstat/lstat()
functions, would imply a performance hit to all stat calls, even those that
don't want to access the st_ino field.

Bruno

Attachment: grep-same-inode-fix.diff
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]