coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: some concern about the fix of " tail: consistently output all data f


From: Reuti
Subject: Re: some concern about the fix of " tail: consistently output all data for truncated files"
Date: Wed, 9 Nov 2016 20:42:47 +0100

Am 09.11.2016 um 09:00 schrieb Zizka, Jan (Nokia - CZ/Prague):

>> -----Original Message-----
>> From: Zhang, Bingxuan (Nokia - CN/Hangzhou)
>> Sent: Wednesday, November 09, 2016 8:51 AM
>> To: Zizka, Jan (Nokia - CZ/Prague) <address@hidden>; Lian, George
>> (Nokia - CN/Hangzhou) <address@hidden>; Pádraig Brady
>> <address@hidden>; address@hidden
>> Cc: Li, Deqian (Nokia - CN/Hangzhou) <address@hidden>; Bao, Xiaohui
>> (Nokia - CN/Hangzhou) <address@hidden>
>> Subject: RE: some concern about the fix of " tail: consistently output all 
>> data
>> for truncated files"
>> 
>>> Can you tell any real use case where the changed tail behaviour would fail
>>> and print old content as you describe? I mean some realy use case not the
>>> behaviour caused by GlusterFS bug.
>> 
>> Not found from real environment, but we can design one program to do this:
>>      A program write a log file, and it want to keep its first 1K bytes
>> always.
>>      When the file reach its limit (e.g. 10K bytes), it truncates its content
>> to 1KB, then start to write content again.
>> 
>> In this case, with new version, the beginning 1KB data will be printed by 
>> tail
>> always when the truncate happen.
> 
> yes I'm sure one can always find some artificial case, but can you think of 
> any real
> usecase? Because I could not think for any kind of real use case.

I used `tail -f` in the past to feed the output of the logfile of IBM's Tivoli 
Storage Manager to a remote syslog.

ITSM can truncate the logfile by keeping only the last e.g. 8 days (no 
rotation), hence the file is getting shorter at one point in time.

(Nowadays I implemented this in syslog-ng directly to read the files and 
forward it to a remote syslog-ng server. And yes: syslog-ng has this behavior 
to output the first part of the file again in case it gets truncated. But as I 
look at it only in case of a problem, it wasn't a reason for me to switch back 
again.)

-- Reuti

(A sophisticated behavior would be to memorize the already output lines, and in 
case the file gets shorter to scan for a block of at least N matching lines to 
synchronize again - no double output, no missing lines.)


> Moreover what may happen is that in case of file rotation with old design 
> that part 
> of the data will be missing in tail output. And that is real usecase.
> 
> Jan
> 
>> 
>> 
>> Br, Jimmy
>> 
>> -----Original Message-----
>> From: Zizka, Jan (Nokia - CZ/Prague)
>> Sent: Wednesday, November 09, 2016 3:41 PM
>> To: Zhang, Bingxuan (Nokia - CN/Hangzhou) <address@hidden>;
>> Lian, George (Nokia - CN/Hangzhou) <address@hidden>; Pádraig
>> Brady <address@hidden>; address@hidden
>> Cc: Li, Deqian (Nokia - CN/Hangzhou) <address@hidden>; Bao, Xiaohui
>> (Nokia - CN/Hangzhou) <address@hidden>
>> Subject: RE: some concern about the fix of " tail: consistently output all 
>> data
>> for truncated files"
>> 
>>> -----Original Message-----
>>> From: Zhang, Bingxuan (Nokia - CN/Hangzhou)
>>> Sent: Wednesday, November 09, 2016 8:19 AM
>>> To: Zizka, Jan (Nokia - CZ/Prague) <address@hidden>; Lian, George
>>> (Nokia - CN/Hangzhou) <address@hidden>; Pádraig Brady
>>> <address@hidden>; address@hidden
>>> Cc: Li, Deqian (Nokia - CN/Hangzhou) <address@hidden>; Bao, Xiaohui
>>> (Nokia - CN/Hangzhou) <address@hidden>
>>> Subject: RE: some concern about the fix of " tail: consistently output all
>> data
>>> for truncated files"
>>> 
>>> Hi,
>>> 
>>> Let's not mix 2 problems here.
>> 
>> yes and I was not mixing the two :)
>> 
>>> 
>>> 1. glusterfs problem  => We'll continue the investigation.
>>> 
>>> 2. tail problem, let's discuss it separately from glusterfs bug, just from 
>>> its
>>> own design.
>>>     New version: when find file size reduce, print content from 0 to the
>>> reduced_size.
>>>     Old version: when find file size reduce, stay in the end of the
>>> reduced size and wait for new content.
>>> Both 2 ways has its limitation,  neither of them are perfect or precisely.
>>> Here I just want to say the older version is better than new version in my
>>> understanding.
>>> Refer to man manual, the '-f' option is designed to print the file which is 
>>> on
>>> append mode, but not designed for the file which might have truncate
>>> happen on it.
>>> "tail" should focus on what is added, but not on the data from original
>>> printed size part of the file.
>> 
>> yes exactly. And in case file is truncated or replaced tail has to assume it 
>> is
>> with
>> new content  which was added.
>> 
>> Can you tell any real use case where the changed tail behaviour would fail
>> and print old content as you describe? I mean some realy use case not the
>> behaviour caused by GlusterFS bug.
>> 
>> Jan
>> 
>>> =============================
>>> # man tail
>>> TAIL(1)                          User Commands                         
>>> TAIL(1)
>>> 
>>> 
>>> NAME
>>>       tail - output the last part of files
>>> ...
>>>       -f, --follow[={name|descriptor}]
>>>              output appended data as the file grows;
>>> ...
>>> =============================
>>> 
>>> Br, Jimmy
>>> 
>>> -----Original Message-----
>>> From: Zizka, Jan (Nokia - CZ/Prague)
>>> Sent: Wednesday, November 09, 2016 3:08 PM
>>> To: Zhang, Bingxuan (Nokia - CN/Hangzhou) <address@hidden>;
>>> Lian, George (Nokia - CN/Hangzhou) <address@hidden>; Pádraig
>>> Brady <address@hidden>; address@hidden
>>> Cc: Li, Deqian (Nokia - CN/Hangzhou) <address@hidden>; Bao, Xiaohui
>>> (Nokia - CN/Hangzhou) <address@hidden>
>>> Subject: RE: some concern about the fix of " tail: consistently output all
>> data
>>> for truncated files"
>>> 
>>>> -----Original Message-----
>>>> From: Zhang, Bingxuan (Nokia - CN/Hangzhou)
>>>> Sent: Wednesday, November 09, 2016 6:36 AM
>>>> To: Lian, George (Nokia - CN/Hangzhou) <address@hidden>;
>> Pádraig
>>>> Brady <address@hidden>; address@hidden
>>>> Cc: Li, Deqian (Nokia - CN/Hangzhou) <address@hidden>; Zizka, Jan
>>>> (Nokia - CZ/Prague) <address@hidden>; Bao, Xiaohui (Nokia -
>>>> CN/Hangzhou) <address@hidden>
>>>> Subject: RE: some concern about the fix of " tail: consistently output all
>>> data
>>>> for truncated files"
>>>> 
>>>> Hi,
>>>> 
>>>> I wonder the original requirement of "tail", what is the purpose of this
>>> tool?
>>>> Referred to:
>>>>    tail - output the last part of files
>>>> 
>>>> Here when "tail" found the some file length become small, is it really
>> need
>>>> to print old content?
>>> 
>>> but tail cannot know if that is old content. The truncate detection was
>>> added there
>>> to overcome problem when someone overwrites the file being tailed, in
>>> which case
>>> it should indeed start dumping the file from beggining.
>>> 
>>>> My opinion is that ignore those old content is better alternative.
>>> 
>>> OK but how would you do that as tail doens't know that it is old content ...
>>> 
>>>> 
>>>> It is possible those "old content" is written newly (e.g. truncate to 0, 
>>>> then
>>>> write small content).
>>>> It is also possible those "old content" is really old (e.g. truncate to 
>>>> small
>>>> size).
>>>> 
>>>> So "tail" can do perfect design here to trace every piece of data write to
>>> the
>>>> file.
>>>> But it should focus on only the data to the last with current reality.
>>>> 
>>>> So my opinion is "revert to previous design" is better choice then
>> currently.
>>>> What you think?
>>> 
>>> If the change is reverted then you will get regressions on the cases for
>> which
>>> this
>>> was added so that is definately not an option.
>>> 
>>> What should be fixed is GlusterFS instead of trying to make workarounds
>> for
>>> its
>>> misbehaviour. As Pádraig also noted:
>>> 
>>>> This stale st_size behavior, giving a smaller value _after_ a read,
>>>> seems quite problematic to lots of apps though, not just tail(1).
>>> 
>>> this will affect other applications and tools not only tail. If you make 
>>> some
>>> kind of
>>> workaround in tail for this and GlusterFS is not fixed then this problem 
>>> will
>>> stay
>>> hidden and will hit some other application sooner or later.
>>> 
>>> Jan
>>> 
>>> 
>>>> 
>>>> 
>>>> Br, Jimmy
>>>> 
>>>> -----Original Message-----
>>>> From: Lian, George (Nokia - CN/Hangzhou)
>>>> Sent: Wednesday, November 09, 2016 9:36 AM
>>>> To: Pádraig Brady <address@hidden>; address@hidden
>>>> Cc: Zhang, Bingxuan (Nokia - CN/Hangzhou)
>> <address@hidden>;
>>>> Li, Deqian (Nokia - CN/Hangzhou) <address@hidden>; Zizka, Jan
>> (Nokia
>>> -
>>>> CZ/Prague) <address@hidden>; Bao, Xiaohui (Nokia - CN/Hangzhou)
>>>> <address@hidden>
>>>> Subject: RE: some concern about the fix of " tail: consistently output all
>>> data
>>>> for truncated files"
>>>> 
>>>> Hi,
>>>>> What network file system type is this?
>>>> 
>>>> The file systems is GlusterFS of Redhat,
>>>> 
>>>>> This stale st_size behavior, giving a smaller value _after_ a read,seems
>>>> quite problematic to lots of apps though, not just tail(1).
>>>> I agree, but I still suppose more application will do get st_size first 
>>>> then
>> do
>>>> seek and read which will not over the size of file.
>>>> 
>>>> We also have submit the issue to GlusterFS community, but till now, they
>>>> can't find the root cause in glusterfs.
>>>> 
>>>> I still complain to "tail application", even if there has some issue on
>>>> glusterfs,
>>>> but "tail" eat all the space of the disk (by continues pseudo-truncate for
>> a
>>>> large syslog file)  , I suggest "tail" could do some change to prevent it.
>>>> 
>>>> Thanks & Best Regards,
>>>> George
>>>> 
>>>> -----Original Message-----
>>>> From: Pádraig Brady [mailto:address@hidden]
>>>> Sent: Tuesday, November 08, 2016 7:29 PM
>>>> To: Lian, George (Nokia - CN/Hangzhou) <address@hidden>;
>>>> address@hidden
>>>> Cc: Zhang, Bingxuan (Nokia - CN/Hangzhou)
>> <address@hidden>;
>>>> Li, Deqian (Nokia - CN/Hangzhou) <address@hidden>; Zizka, Jan
>> (Nokia
>>> -
>>>> CZ/Prague) <address@hidden>; Bao, Xiaohui (Nokia - CN/Hangzhou)
>>>> <address@hidden>
>>>> Subject: Re: some concern about the fix of " tail: consistently output all
>>> data
>>>> for truncated files"
>>>> 
>>>> On 08/11/16 02:50, Lian, George (Nokia - CN/Hangzhou) wrote:
>>>>> Hi,
>>>>>>> Add one more suggestion, if we have not a perfect solution to
>> consider
>>>> all the case of truncate, could we add an option to tail, such like tail 
>>>> -no-
>>>> truncate
>>>>>>> If tail run with this option, than application not consider any
>> truncate
>>>> case.
>>>>>>> 
>>>>>>> For example, I suppose syslog output file will not have any truncate
>>> case
>>>> in our environment, then the tail could use the option to avoid the mis-
>>>> truncated case?
>>>>> 
>>>>>> Note for case 2) above, we only update fspec->size _after_ the read,
>>>>>> so I'm not sure how practical the race with reading a _smaller_ st_size
>>>> after that is?
>>>>>> I.E. the heuristic is fairly good I think,
>>>>>> so an option may be overkill.
>>>>>> We'd have to see a demonstratable issue to consider such an option.
>>>>> 
>>>>> We have an issue now for tail a syslog file which stored in a network-
>>> based
>>>> file system. A automated cased need tail the syslog about one hour to
>> get
>>>> the syslog of that period,
>>>>> in that period of one hour , happen 6 times of  un-expected file
>>> truncated
>>>> issue, so the output of tail has 6 times full syslog file, so the output 
>>>> file is
>>> so
>>>> huge and eat all of the disks.
>>>>> The network-based file system maybe not so easy to change to meet
>> the
>>>> current implement of "tail" application.
>>>>> So I need helps from yours :)
>>>>> 
>>>>> And which your mean for demonstratable?  The issue we encounter
>>> could
>>>> be easy to reproduce, maybe the file-system is not so strict like ext4 file
>>>> system,
>>>>> but I still suggest "tail" application could do some change to adapt this
>>>> kinds network-based file system?
>>>> 
>>>> It's important info that you have seen the issue.
>>>> What network file system type is this?
>>>> We might just revert this change if the issue is widespread enough.
>>>> 
>>>> This stale st_size behavior, giving a smaller value _after_ a read,
>>>> seems quite problematic to lots of apps though, not just tail(1).
>>>> 
>>>> thanks,
>>>> Pádraig.
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]