bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] split: --chunks option


From: Pádraig Brady
Subject: Re: [PATCH] split: --chunks option
Date: Thu, 26 Nov 2009 09:30:36 +0000
User-agent: Thunderbird 2.0.0.6 (X11/20071008)

Chen Guo wrote:
> Hi all,
>     This is mostly a step towards multithreaded sort the unix way, but as 
> Padraig mentioned, has its other uses.

Thanks again for looking at this.

> Parsing and I/O are not my strong suits, so I have a couple of questions:
> 
>     Are there more appropriate functions than open and pread to use here? I 
> usually see wrapper functions called in place of actual functions like fopen, 
> fread, etc, and it feels rather inappropriate for me to use open and pread 
> here.
> 
>     And are there any suggestions for parsing the --chunk option in a better 
> way? I feel having two separate options specifying both required values is 
> redundant, so I decided to separate the values by a comma, as Jim had in an 
> example he linked me. The way I wrote it, it feels like a hacked workaround, 
> but I'm not sure how else to get around that comma.

That's pretty much what I was thinking from the first mail I quoted:

  The `read_chunk` process above is currently awkward and
  inefficient to implement with dd and split. As a first step
  I think it would be very useful to add a --number option to
  `split`, like:
  --number=5 #split input into 5 chunks
  --number=2/5 #output chunk 2 of 5 to [to stdout]
  In text mode, this should handle only splitting on a line
  boundary, and adding any extra data into the last chunk.

I do think --number is more general than --chunk as it allows you to specify 
only 1 number
to get the behaviour described above. Also I notice that FreeBSDs split recently
got a '-n chunk_count' option, so it would be good to maintain compat with that 
if possible.

We also need to decide how to select between text and binary modes for --number.
Note reading from non seekable input complicates things.
For binary data I don't see how one could support --number.

> 
>     Also, any opinions on how the lines should be output? As of now I just 
> have it as stdout, since that's how I see sort would use it. And of course, 
> anything else I missed/could've done better? Thanks a lot guys.

It makes sense to just send the single "chunk" to stdout.

cheers,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]