bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17637: bug "cut of end-line is skipped"


From: Pádraig Brady
Subject: bug#17637: bug "cut of end-line is skipped"
Date: Fri, 30 May 2014 18:43:38 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 05/30/2014 03:34 AM, Pádraig Brady wrote:
> On 05/29/2014 11:53 PM, Eric Blake wrote:
>> On 05/29/2014 04:24 PM, Pádraig Brady wrote:
>>> tag 17637 notabug
>>> close 17637
>>> stop
>>
>> On the one hand, this feels a bit premature.
>>
>>>
>>> That change 
>>> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=51ce0bf8
>>> was made in v8.21 to fix http://bugs.gnu.org/13498
>>
>> Are you sure you didn't mean the next commit:
>>
>> http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=d302aed
> 
> Right sorry.

Or rather the two in combination.

>> But both of those commits are in coreutils 8.21, whereas the Fedora 20
>> build of coreutils 8.21 does not have that behavior.  Is downstream
>> patching things in a way to make it work, and if so, why can't we
>> backport what Fedora has added on top?
> 
> That's the i18n patch which has diverged here:
> 
>   $ seq 10 | LANG=C cut -s -f2 -d$'\n'
>   $ seq 10 | cut -s -f2 -d$'\n'
>   2
> 
> Unfortunately that means we have an inconsistency.
> Also many users might still be getting the old behavior
> (and thus not complaining about the new behavior)
> and Rudy may be hitting this only because the script is
> being run in the C locale?

Seems debian/ubuntu are closer to upstream and don't
apply the i18n patch. So it's good that lots were
exposed to this change and this is the first reported issue.

>>> It was made for a good reason, to handle the buffering issues detailed
>>> in the above bug. Your existing usage was a bit of an edge case and not
>>> supported with other cut implementations, and while we try to avoid
>>> changes like this it was thought the benefits outweighed the impact
>>> for the very few who use cut in this way.
>>
>> But while you documented the improved buffering behavior in NEWS, you
>> failed to document the corner-case change to -d$'\n'.
>>
>> On the other hand, I confirmed that both Solaris and FreeBSD cut behave
>> the same way as the new GNU cut behavior.
>>
>> $ nl='
>> '
>> $ printf 'a\t1\nb\t2\n' | cut -d"$nl" -f1
>> a       1
>> b       2
>>
>> So keeping the new behavior in the name of consistency makes sense,
>> although it still might be nice to add a retroactive NEWS entry.
> 
> Ugh I'm not sure now. Consistency is good if that consistent
> behavior is needed, though I suppose the use case of using -s -d$'\n'
> to suppress the last line if it has no trailing newline is a lot more
> esoteric than using cut like this for example:
> 
>   $ seq 10 | cut -f2,3,7 -d$'\n' --output-delimiter='|'
>   2|3|7
> 
> So I'm leaning towards restoring that behavior.
> (I notice cut consumes all input even if the last
> line (field) needed is output, so we could improve that too).
> 
> I'll sleep on it.

Arguments for reverting to old behavior:
  compat for coreutils only scripts
  compat with existing i18n patch in various distros
  Valid use cases albeit achievable with other utils

Arguments for keeping new behavior
  avoids users introducing unneeded GNU extensions to scripts
  avoids special casing last '\n' in file
  allows for more efficient implementation

Compat concerns win here I think so
I'm 55:45 in favor of applying the attached patch
to reinstate the functionality.

thanks,
Pádraig.

Attachment: cut-line-select.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]