[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: feature request: gzip/bzip support for sort
From: |
Jim Meyering |
Subject: |
Re: feature request: gzip/bzip support for sort |
Date: |
Sun, 21 Jan 2007 22:41:11 +0100 |
Dan Hipschman <address@hidden> wrote:
> On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote:
>> Not to look the gift horse in the mouth, but it'd be nice
>> if you wrote ChangeLog entries, too. And even (gasp! :-)
>> a test case or two. Of course, we'd expect such a test case
>> (probably named tests/misc/sort-compress, and based on
>> tests/sample-test) to have this line in it:
>>
>> . $srcdir/../very-expensive
>>
>> If you don't have time for that, I'll take care of it, eventually.
>
> I'm not going to stop you :-) I just haven't had the time to look into
> it yet, so I've just been running the coreutils tests and then my own
> tests. I was planning on adding the tests, as you say, "eventually".
>
>> Default to just "gzip", not /bin/gzip. The latter may not exist;
>> your patch already handles that, but /bin/gzip may not be the first
>> gzip in PATH. Also, don't bother with the access-XOK check.
>> There's no point in incurring even such a small overhead in the
>> general case, when no temporary is used.
>
> The reason I put the access check in there is that if we default to gzip
> and it doesn't exist, then of course the exec will fail, the child will
This is a good argument for using libz by default, not a separate
gzip program. Why incur the overhead of an exec when we don't need to?
Now, I'm convinced that sort should provide built-in support for both
gzip and bzip2. How to select built-in vs. actually exec the program
is something to think about...
Of course it should still be possible to specify some other program.
> fail, and this will cause sort to fail when it really should just not do
> the compression, or try another default if something suitable exists
> (what about compress?). How about we just delay the determination of
compress? no thank you! :-)
> the compress program until it's actually needed (e.g., in create_temp
> right before "if (compress_program)" we have "if (compress_program_known)"
> and inside the body we check the environment variable and/or do access
> checks on possible defaults)?
>
>> But please address the FIXME I've added.
>
> If we can't fork a compression process, it's not the end of the world.
> We just don't compress that file and sort will still work. If we can't
> fork a decompression process, we can't continue the sort. So I figure
> we'll just try twice to fork compression processes, and if we can't do
> it after 1 sec, we're probably wasting more time waiting to fork than we
> would doing disk access. However, we really need to be able to fork
> decompression processes, so we can afford to wait a really long time for
> it. I was considering making the number of tries for decompression
> processes even larger (now, it'll wait about 2 min before giving up).
>
>> Have you considered using the gnulib hash module rather than
>> rolling your own? There are examples in many of coreutils/src/*.c.
>
> I'm not familiar with gnulib, so I didn't know a hash module existed or
> think to look for one. Looking at it now, though, it seems it will be
> slower because of its abstraction, unless the table fills up to the
Performance isn't the issue here, but code-reuse.
I would be very surprised if changing hash table implementations
has any measurable effect on sort's performance.
> point where it would be faster to access if it grew. I'd prefer (since
> it seems to be my time we're talking about), to leave it the way it is
> because it's simple, and see if the gnulib module is faster later.
- Re: feature request: gzip/bzip support for sort, (continued)
- Re: feature request: gzip/bzip support for sort, Paul Eggert, 2007/01/15
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/14
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/15
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/15
- Re: feature request: gzip/bzip support for sort, Paul Eggert, 2007/01/16
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/20
- Re: feature request: gzip/bzip support for sort, James Youngman, 2007/01/21
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/21
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/21
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/21
- Re: feature request: gzip/bzip support for sort,
Jim Meyering <=
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/21
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/21
- Re: feature request: gzip/bzip support for sort, Bauke Jan Douma, 2007/01/22
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/23
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/24
- Re: feature request: gzip/bzip support for sort, Eric Blake, 2007/01/24
- Re: feature request: gzip/bzip support for sort, Paul Eggert, 2007/01/24
- Re: feature request: gzip/bzip support for sort, Craig Macdonald, 2007/01/25
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/25
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/24