bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: stdin seekable failure


From: Eric Blake
Subject: Re: stdin seekable failure
Date: Fri, 27 Apr 2007 22:28:53 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.10) Gecko/20070221 Thunderbird/1.5.0.10 Mnenhy/0.7.5.666

According to Bruno Haible on 4/27/2007 2:37 PM:
> Eric Blake wrote:
>> However, the fflush module is still broken for mingw when reading files in 
>> text 
>> mode that have plain LF line endings.  It appears that mingw's ftell tries 
>> to 
>> compensate for \r\n line endings in text mode, but does the compensation 
>> even 
>> when there is just a plain \n line ending, which means that ftell can report 
>> the wrong offset (since the \r got stripped before the read buffer was 
>> populated, there is no way to tell if the remaining \n in the read buffer 
>> were 
>> coupled with a stripped \r or not).  I'm still trying to think of ways to 
>> make 
>> fflush reliable in spite of that limitation.
> 
> If you don't see a solution - and I don't see one - then maybe the fix is to
> treat streams opened in O_TEXT mode like non-seekable streams.

Hmm, that might actually be the simplest solution.  Although it may still
be possible to pull off, just with a lot of code and processing overhead.

The way cygwin does it is that when opening a FILE in text mode, it also
opens the underlying fd with O_TEXT, but when it comes time to read from
the file, it temporarily swaps the fd to O_BINARY to fill the buffer,
restores the fd back to O_TEXT, then does the \r filtering in the fgetc
calls.  That way, ftell is always the underlying file position (and
fgetpos just uses ftell).  But it looks like mingw just leaves the fd in
O_TEXT the entire time; then when computing the ftell offset, it takes the
known fd position at the end of the buffer, and subtracts the number of
characters not yet processed by fgetc, along with the number of \n
characters encountered in the rest of the buffer (basically assuming that
all \n must have had a corresponding \r); which means that ftell can
return a negative number even on a seekable file.  The workaround would be
to see if lseek returns a non-negative number; and if so, read the
internals of the FILE to see how many bytes remain in the read buffer (the
default buffer size is 4k); then since the file is seekable, temporarily
swap the fd into O_BINARY mode, reread the buffer, count how many raw \n
occur, then convert the fd back to O_TEXT before returning the corrected
answer.  The workaround would also have to handle ^Z.

> 
>> Also, in testing this, I discoverd that cygwin stdin has a bug - when a 
>> process 
>> starts, stdin is always in 32-bit mode (even though cygwin has been 
>> compiling 
>> in 64-bit mode since cygwin 1.5.0).  But since cygwin ftello really invokes 
>> ftello64 under the hood, and ftello64 fails unless the file is open in 
>> 64-bit 
>> mode, it is possible for ftell(stdin) to return non-negative while ftello
>> (stdin) fails with EOF.
> 
> Can't you then just write an rpl_ftello that tries ftello and then ftell?
> Or is this impossible because it kills the failure detection when 4 GB have
> been read from a 32-bit stream?

I don't see any graceful way to get 64-bit offsets out of stdin on cygwin
1.5.x.  The stdio implementation uses different callbacks to distinguish
32 vs 64 bit modes.  It IS possible to detect whether stdin has the flag
(FILE._flags & __SL64) set (implying stdin has been freopen'd, and thus
ftello will work because 64-bit offsets are now in effect).   But the
difference between ftello working and failing is whether FILE._seek is set
to __sseek or __sseek64, and since cygwin does not export either of these
functions, there is no easy way to fix stdin to overcome the 2GB limit if
__SL64 is clear.

-- 
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]