[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#25832: split (v 8.25) with numeric suffixes beyond 89
From: |
Pádraig Brady |
Subject: |
bug#25832: split (v 8.25) with numeric suffixes beyond 89 |
Date: |
Tue, 21 Feb 2017 19:32:02 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 |
unarchive 20874
forcemerge 20874 25832
stop
On 21/02/17 18:40, Assaf Gordon wrote:
> Hello,
>
>> On Feb 21, 2017, at 19:55, Holger Wolff <address@hidden> wrote:
>>
>> Incorrect numeric suffixes are sometimes produced when going beyond number
>> 89:
>> Assume a file "test.txt" with 1000 lines, and the command
>>
>> $ split -d -l 10 test.txt test_
>>
>> I expect files test_00 through test_99, but what I get are test_00 through
>> test_89 and test_9000 through test_9009.
>
> Thank you for the bug report.
>
> I can confirm this is reproducible in the latest revision.
>
> The immediate reason is that without a starting value,
> coreutil's split has a feature to 'widen' the filename,
> but the logic to widen it follows the alphabet widening
> and doesn't work well for numeric widening.
>
> That is, when not using numeric-suffixes,
> 'yz' (the last two letters) are widened to 'zaaa':
>
> $ seq 1000 | split -l 1 - foo_
>
> will result in:
>
> ...
> foo_yy
> foo_yz
> foo_zaaa
> foo_zaab
> ...
>
> And you are seeing the last two digits ('89')
> widened in the same logic (to '9000').
>
>
> Technically, if 'numeric_suffix_start'
> is left as 'null' in the parsing of --numeric-suffix:
> http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/split.c#n1455
>
> then the widening logic behaves as if those were letters, not digits
> in 'split.c:next_file_name()':
> http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/split.c#n403
>
>
>
> An immediate band-aid of defaulting to numeric_suffix_start=0
> will result in an unintended consequences (a regression, perhaps):
> If more files needs to be created, an explicit numeric start value prevents
> filename widening (this wasn't the case in your example because 1000 lines
> fit in 100 files of 10 lines):
>
> # Works, filenames will be widened to 9010.
> $ seq 1001 | split -l 10 --numeric-suffix - foo_
>
> # Widening is not allowed (from default of 2 digits), split fails:
> $ seq 1001 | split -l 10 --numeric-suffix=0 - foo_
> split: output file suffixes exhausted
>
>
> What do others think: default to no-widening for numeric suffixes,
> or add code to 'next_file_name()' for numeric widening ?
This was discussed at http://bugs.gnu.org/20874
I'm not sure anything needs to be done here,
since for backward compat for concat operations
expecting lexical sort we use the current auto widening scheme.
cheers,
Pádraig