bug#25832: split (v 8.25) with numeric suffixes beyond 89

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25832: split (v 8.25) with numeric suffixes beyond 89

From:	Pádraig Brady
Subject:	bug#25832: split (v 8.25) with numeric suffixes beyond 89
Date:	Tue, 21 Feb 2017 19:32:02 -0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

unarchive 20874
forcemerge 20874 25832
stop

On 21/02/17 18:40, Assaf Gordon wrote:
> Hello,
> 
>> On Feb 21, 2017, at 19:55, Holger Wolff <address@hidden> wrote:
>>
>> Incorrect numeric suffixes are sometimes produced when going beyond number 
>> 89:
>> Assume a file "test.txt" with 1000 lines, and the command
>>
>> $ split -d -l 10 test.txt test_
>>
>> I expect files test_00 through test_99, but what I get are test_00 through 
>> test_89 and test_9000 through test_9009.
> 
> Thank you for the bug report.
> 
> I can confirm this is reproducible in the latest revision.
> 
> The immediate reason is that without a starting value,
> coreutil's split has a feature to 'widen' the filename,
> but the logic to widen it follows the alphabet widening
> and doesn't work well for numeric widening.
> 
> That is, when not using numeric-suffixes,
> 'yz' (the last two letters) are widened to 'zaaa':
> 
>      $ seq 1000 | split -l 1 - foo_
> 
> will result in:
> 
>      ...
>      foo_yy
>      foo_yz
>      foo_zaaa
>      foo_zaab
>      ...
> 
> And you are seeing the last two digits ('89')
> widened in the same logic (to '9000').
> 
> 
> Technically, if 'numeric_suffix_start'
> is left as 'null' in the parsing of --numeric-suffix:
>  http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/split.c#n1455
> 
> then the widening logic behaves as if those were letters, not digits
> in 'split.c:next_file_name()':
>  http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/split.c#n403
> 
> 
> 
> An immediate band-aid of defaulting to numeric_suffix_start=0
> will result in an unintended consequences (a regression, perhaps):
> If more files needs to be created, an explicit numeric start value prevents
> filename widening (this wasn't the case in your example because 1000 lines 
> fit in 100 files of 10 lines):
> 
>     # Works, filenames will be widened to 9010.
>     $ seq 1001 | split -l 10 --numeric-suffix - foo_
> 
>     # Widening is not allowed (from default of 2 digits), split fails:
>     $ seq 1001 | split -l 10 --numeric-suffix=0 - foo_
>     split: output file suffixes exhausted
> 
> 
> What do others think: default to no-widening for numeric suffixes,
> or add code to 'next_file_name()' for numeric widening ?

This was discussed at http://bugs.gnu.org/20874

I'm not sure anything needs to be done here,
since for backward compat for concat operations
expecting lexical sort we use the current auto widening scheme.

cheers,
Pádraig

[Prev in Thread]

Current Thread

[Next in Thread]

bug#25832: split (v 8.25) with numeric suffixes beyond 89, Holger Wolff, 2017/02/21
- bug#25832: split (v 8.25) with numeric suffixes beyond 89, Assaf Gordon, 2017/02/21
  - bug#25832: split (v 8.25) with numeric suffixes beyond 89, Pádraig Brady <=
    - bug#25832: split (v 8.25) with numeric suffixes beyond 89, Assaf Gordon, 2017/02/21
    - bug#25832: split (v 8.25) with numeric suffixes beyond 89, Pádraig Brady, 2017/02/22

Prev by Date: bug#25832: split (v 8.25) with numeric suffixes beyond 89
Next by Date: bug#25832: split (v 8.25) with numeric suffixes beyond 89
Previous by thread: bug#25832: split (v 8.25) with numeric suffixes beyond 89
Next by thread: bug#25832: split (v 8.25) with numeric suffixes beyond 89
Index(es):
- Date
- Thread