bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#67593: `split --number=l/N` no longer splits evenly


From: Pádraig Brady
Subject: bug#67593: `split --number=l/N` no longer splits evenly
Date: Sun, 3 Dec 2023 13:17:33 +0000
User-agent: Mozilla Thunderbird

On 03/12/2023 09:37, Paul Eggert wrote:
That's not a bug, in that 'split' is behaving as documented. The first
input line is one byte shorter than the second one. 'Split' divides the
input into two regions, and because the first region happens to be one
byte longer than the second region both input lines are sent to the
first output file.

In older coreutils, 'split' used a different algorithm to compute region
sizes, which worked better for your test case but considerably worse in
others. For example, in older coreutils:

seq 50 >in
split -n l/71 in

created 43 files of size 0, 9 files of size 2, 18 files of size 3, and
one file of size 69. Current coreutils splits much better: it creates 21
files of size 0, 9 files of size 2, and 41 files of size 3.

Related to this, I think it would be useful to add a new
split --number=L/N` mode (note the capital L), which tries harder
to evenly distribute lines.
It would only be supported when we can determine the number of lines up front,
and so wouldn't be supported when reading from a pipe for e.g.

cheers,
Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]