[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Support in sort for human-readable numbers
From: |
Pádraig Brady |
Subject: |
Re: Support in sort for human-readable numbers |
Date: |
Tue, 6 Jan 2009 15:37:06 +0000 |
User-agent: |
Thunderbird 2.0.0.6 (X11/20071008) |
Vitali Lovich wrote:
> I've read the proposed patches that have been batted around on the mailing
> list (after coming up with my own implementation :D of course). My proposed
> solution is less generic, but I believe more robust, than the other
> approaches.
>
> I've proposed my reasoning below, but I've posted it as a bug on launchpad
> to track this issue 313152 <https://bugs.launchpad.net/bugs/313152>. The
> patch is against 6.10 instead of trunk mainly because I was too lazy to get
> the build-system set up on Ubuntu. That being said, I'm pretty sure the
> patch should still work against the trunk. In any case, if it's necessary,
> I could also do the diff against the trunk.
>
> Code review?
> What would I need to do to get this mainlined (aside from adding the
> documentation changes)?
>
> REASONING:
>
> One of my major assumption is that all the numbers are well formatted.
>
> In other words, there's an explicit demarcation in the number line (at least
> internal to the input being sorted) after which the suffix increases and the
> number starts again near 0. For instance, if M represents 1050 Kilobytes,
> then there's no 1051K - it's represented as 1.001M or something along those
> lines. Again, this would only rely on the input being internally consistent
> - sort needs no knowledge or hints of what those suffixes represent.
I like the idea.
So it doesn't support sorting these correctly for example:
999MB
998MiB
1GiB
1030MiB
I.E. a mixture of ^2 and ^10 are not supported,
nor overlapping number ranges.
That's probably OK though since a tool shouldn't
generate mixtures like that on a particular run.
You had a few questions in your patch...
+ /* FIXME: add support for aliases (i.e. k/K, m/M, g/G) */
Not necessary I think. You already handle case aliases with toupper().
+ /* FIXME: maybe add option to check for longer suffixes (i.e. gigabyte) */
You should allow at least G, GiB and GB formats.
Probably should print error if more than one of those
formats used, since that's not supported.
+ /* FIXME: maybe add option to allow for spacing between number and suffix */
I wouldn't bother. If we supported general input maybe, but not otherwise.
+ /* FIXME: maybe not use exponent if one of the strings uses an
+ unrecognized suffix that's not a blank */
I would flag that as an error
+ /* FIXME: a_order - b_order || raw_comparison can be used - would that
+ be faster? */
Yep if you're not supporting overlapping number ranges then
you can skip the number comparison totally if the suffixes don't match.
cheers,
Pádraig.
Received: from tombstone.lincor.com ([84.203.137.218]) by mail1.mxsweep.com
with Microsoft SMTPSVC(6.0.3790.3959);
Tue, 6 Jan 2009 15:37:32 +0000
Received: from [192.168.2.25] (crom.labs.lincor.com [192.168.2.25])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(No client certificate requested)
by tombstone.lincor.com (Postfix) with ESMTP id B1C25600C209;
Tue, 6 Jan 2009 15:37:06 +0000 (GMT)
Message-ID: <address@hidden>
Date: Tue, 6 Jan 2009 15:37:06 +0000
From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= <address@hidden>
User-Agent: Thunderbird 2.0.0.6 (X11/20071008)
MIME-Version: 1.0
To: Vitali Lovich <address@hidden>
CC: address@hidden
Subject: Re: Support in sort for human-readable numbers
References: <address@hidden>
In-Reply-To: <address@hidden>
X-Enigmail-Version: 0.95.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Return-Path: address@hidden
X-OriginalArrivalTime: 06 Jan 2009 15:37:34.0701 (UTC)
FILETIME=[B22229D0:01C97014]
x-MXSweep-KeywordsCount: 0
x-MXSweep-MetaScanResult: Clean
x-MXSweep-MetaScanThreat:
x-MXPurifier-SpamScore: 0
x-MXPurifier-VirusScore: 0
X-MXSweep-Threat: Clean
X-MXUniqueID: 738389df-d34d-4bcb-b40d-2038ba960ee6
Vitali Lovich wrote:
> I've read the proposed patches that have been batted around on the mailing
> list (after coming up with my own implementation :D of course). My proposed
> solution is less generic, but I believe more robust, than the other
> approaches.
>
> I've proposed my reasoning below, but I've posted it as a bug on launchpad
> to track this issue 313152 <https://bugs.launchpad.net/bugs/313152>. The
> patch is against 6.10 instead of trunk mainly because I was too lazy to get
> the build-system set up on Ubuntu. That being said, I'm pretty sure the
> patch should still work against the trunk. In any case, if it's necessary,
> I could also do the diff against the trunk.
>
> Code review?
> What would I need to do to get this mainlined (aside from adding the
> documentation changes)?
>
> REASONING:
>
> One of my major assumption is that all the numbers are well formatted.
>
> In other words, there's an explicit demarcation in the number line (at least
> internal to the input being sorted) after which the suffix increases and the
> number starts again near 0. For instance, if M represents 1050 Kilobytes,
> then there's no 1051K - it's represented as 1.001M or something along those
> lines. Again, this would only rely on the input being internally consistent
> - sort needs no knowledge or hints of what those suffixes represent.
I like the idea.
So it doesn't support sorting these correctly for example:
999MB
998MiB
1GiB
1030MiB
I.E. a mixture of ^2 and ^10 are not supported,
nor overlapping number ranges.
That's probably OK though since a tool shouldn't
generate mixtures like that on a particular run.
You had a few questions in your patch...
+ /* FIXME: add support for aliases (i.e. k/K, m/M, g/G) */
Not necessary I think. You already handle case aliases with toupper().
+ /* FIXME: maybe add option to check for longer suffixes (i.e. gigabyte) */
You should allow at least G, GiB and GB formats.
Probably should print error if more than one of those
formats used, since that's not supported.
+ /* FIXME: maybe add option to allow for spacing between number and suffix */
I wouldn't bother. If we supported general input maybe, but not otherwise.
+ /* FIXME: maybe not use exponent if one of the strings uses an
+ unrecognized suffix that's not a blank */
I would flag that as an error
+ /* FIXME: a_order - b_order || raw_comparison can be used - would that
+ be faster? */
Yep if you're not supporting overlapping number ranges then
you can skip the number comparison totally if the suffixes don't match.
cheers,
Pádraig.
Re: Support in sort for human-readable numbers,
Pádraig Brady <=
Message not available
- Re: Support in sort for human-readable numbers, Vitali Lovich, 2009/01/06
- Re: Support in sort for human-readable numbers, Pádraig Brady, 2009/01/06
- Re: Support in sort for human-readable numbers, Vitali Lovich, 2009/01/06
- Re: Support in sort for human-readable numbers, Pádraig Brady, 2009/01/06
- Re: Support in sort for human-readable numbers, Vitali Lovich, 2009/01/07
- Re: Support in sort for human-readable numbers, Jim Meyering, 2009/01/07
- Re: Support in sort for human-readable numbers, Vitali Lovich, 2009/01/07
Re: Support in sort for human-readable numbers, Matthew Woehlke, 2009/01/13
Re: Support in sort for human-readable numbers, Vitali Lovich, 2009/01/13
Re: Support in sort for human-readable numbers, Paul Eggert, 2009/01/07