bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: split behavior


From: Roger McNichols
Subject: Re: split behavior
Date: Sun, 13 Sep 2009 20:54:35 -0500 (CDT)


Thanks for the feedback.


> Do you mean select the appropriate suffix length based on size,
> or do you mean the zzaa, zzab scheme? The former wouldn't
> help when processing a pipe for example so I'd probably
> stick with the latter method for consistency.

Currently, split (at least 5.2.1) DOES pick the suffix size based on the file 
size when used as "split -<#> file" and the file size is known.  But as you 
point out, if the file is a pipe you may still run out of suffixes if the file 
size
changes after invocatio of slpit, or if split is used in the "split -<#> -" 
(reads stdin) mode, a 2-letter suffix is all you get unless you specify a 
length.
Now I suppose that maybe the discussion went something like:
  >> what if an unknown-sized input stream is the input?
  >> well then just use -a 100  and you will never* run out...
     (*note 26^100 is pretty big)

Anyway, I propose to develop a new commandline option that would invoke the 
'old'
suffix formation behavior.  And even though aa ... zaa ... zzaa ... instead of 
aa .. zzaa ... zzzzaa (as well as many other schemes) would work just as well, 
I 
propose to utilize the 'old' one for the added advantage of reverse 
compatibility.
That way any code that relied on the old scheme for counting would be able to be
re-functionalized with a simple addition of a commandline argument.

> if the suffix len is specified and is too small.
> Otherwise we use the zzaa, zzab method as described before.

This is also a good idea, but it might override the users intention which could 
be to use split to detect a file that was more that 676*N lines long or to use 
it 
with the -1 option and only write our the first 676 lines of the input (who 
knows why, 
but we're fixing a fix that broke something else, right?)  So I propose to 
leave 
that failure in place and just add a commandline method of invoking the 'old'
on-the-fly suffix creation method.

I will take a stab at this when I get a chance, but if anyone else wants to move
forward with it sooner, I will not be offended.


Thanks again!

-roger



___________________________
Roger J. McNichols, Ph.D.
Chief Scientist
BioTex, Inc.
8058 El Rio St.
Houston, TX  77054
713.741.0111 (o)
713.741.0122 (f)
832.338.4371 (m)

----- Pádraig Brady <address@hidden> wrote:
> Eric Blake wrote:
> > According to Roger McNichols on 9/11/2009 6:51 PM:
> >> Currently using version 5.2.1 of coreutils 'split' command produces files 
> >> with 'intelligent' suffixes.  That is, the number of letters (or digits) 
> >> required
> >> is based on the known number of output files that will be required.
> > 
> > coreutils 5.2.1 is quite old; the latest stable version is 7.6.
> > 
> > Thanks for the report.  That said, POSIX requires that split stop
> > processing input and give an error after suffixes have been exhausted,
> > rather than using longer suffixes:
> > http://www.opengroup.org/onlinepubs/9699919799/utilities/split.html
> 
> That goes against the GNU "no limits" policy though.
> How about we fail only if POSIXLY_CORRECT is set or
> if the suffix len is specified and is too small.
> Otherwise we use the zzaa, zzab method as described before.
> 
> > But at least there is the -n option to specify a larger suffix.
> 
> right, -a.
> As an aside, I just noticed FreeBSD has -n to specify the number of
> chunks rather than the size. That's a very useful feature
> that was already on my TODO list.
> 
> > If you are still worried about running out of suffixes, then it would
> > probably be worth implementing a command line option that allows split to
> > use intelligent suffixes; we can't make it the default because of POSIX,
> > but we can at least provide it via a new option.  Would you like to submit
> > the patch?
> 
> Do you mean select the appropriate suffix length based on size,
> or do you mean the zzaa, zzab scheme? The former wouldn't
> help when processing a pipe for example so I'd probably
> stick with the latter method for consistency.
> 
> cheers,
> Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]