[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with c
From: |
Bruno Haible |
Subject: |
Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils) |
Date: |
Wed, 4 Aug 2010 01:33:11 +0200 |
User-agent: |
KMail/1.9.9 |
Hi Paul,
> > Here is the generalization of 'strxfrm' to strings with embedded NUL bytes.
>
> Sorry, I didn't really notice this email until just now. As it happens,
> coreutils has had an memxfrm implementation since 2006, which
> it never exported to gnulib.
And I'm sorry that I overlooked yours in coreutils when I contributed
memxfrm to gnulib in 2009.
> The coreutils memxfrm is closer to how
> strxfrm behaves, in that it does not allocate memory: it relies on the
> caller to do memory allocation. The signatures differ as follows:
>
> // coreutils returns number of bytes that were translated,
> // (or would be translated if there were enough room).
> // It also sets errno on error.
> size_t memxfrm (char *restrict dst, size_t dstsize,
> char *restrict src, size_t srcsize);
>
> // gnulib returns pointer to destination, which is possibly-different if
> // the destination wasn't large enough. It updates *DSTSIZEPTR to
> // the newly allocated size, if it allocated storage. It returns
> // NULL (setting errno) on error.
> char *memxfrm (char *src, size_t srcsize, char *dst, size_t *dstsizeptr);
Indeed the algorithm is virtually identical, and the only difference is
the calling convention.
> So I propose that the gnulib memxfrm be renamed to something else, to
> reflect the fact that it allocates memory. I suggest the name
> "amemxfrm", as a leading "a" is the usual convention for variants that
> allocate memory (e.g., "asprintf").
>
> I guess the coreutils memxfrm could also be migrated into gnulib,
> afterwards.
This approach would make sense if the two functions had different
functionality. But they effectively do the same, only with different
calling conventions. Therefore I believe gnulib should only have one
of these functions, either the best among the two, or a combination that
combines the best properties of the two.
> For coreutils, the coreutils interface is more memory-efficient,
> because malloc is invoked at most once when comparing two lines. If
> the small buffer on the stack isn't large enough to hold the
> translated output for both strings, the two calls to memxfrm will tell
> sort.c exactly how big the buffer should be, and it can invoke malloc
> just once and then invoke memxfrm again (twice) to successfully do the
> translation.
>
> The gnulib interface is more convenient for applications that don't
> care about this sort of memory optimization, and I expect that for
> some (large) cases it is faster because it sometimes avoids translating
> the same chunk twice. So it's useful as well.
Since you want to let the two functions compete by performance, find
attached a program that exercises a small string 3 times with both,
then a large string 3 times with both. 1000 calls in each round.
Compiled like this:
$ gcc -O2 -Wall coreutils-memxfrm.c gnulib-memxfrm.c compare.c -I. -Drestrict=
I observe timings like this:
Time for gnulib_memxfrm: 0,036002
Time for coreutils_memxfrm: 0,036002
Time for gnulib_memxfrm: 0,036002
Time for coreutils_memxfrm: 0,036003
Time for gnulib_memxfrm: 0,032002
Time for coreutils_memxfrm: 0,036002
Time for gnulib_memxfrm: 2,65217
Time for coreutils_memxfrm: 3,45622
Time for gnulib_memxfrm: 1,97612
Time for coreutils_memxfrm: 3,42021
Time for gnulib_memxfrm: 1,98012
Time for coreutils_memxfrm: 3,42021
This means, when the stack buffer is sufficient - no mallocs needed on either
side - the timings are the same: 36 μsec per call on each side.
But when the stack buffer is not sufficient, then the use of coreutils memxfrm
is 30% to 70% slower than the use of gnulib memxfrm, with a difference of
700 μsec at least. You argue that the benefit of coreutils' memxfrm is that it
requires one less malloc. True, but a malloc of 40 KB is much much cheaper
than a call to memxfrm on 40 KB (think of all the locale dependent processing
that it must do). To get figures about this, I added an extra strdup + free to
the first loop in compare(). The timings are indistinguishable:
$ ./a.out
Time for gnulib_memxfrm: 0,032002
Time for coreutils_memxfrm: 0,036002
Time for gnulib_memxfrm: 0,036002
Time for coreutils_memxfrm: 0,032002
Time for gnulib_memxfrm: 0,036002
Time for coreutils_memxfrm: 0,036003
Time for gnulib_memxfrm: 2,18814
Time for coreutils_memxfrm: 3,41621
Time for gnulib_memxfrm: 1,98012
Time for coreutils_memxfrm: 3,42021
Time for gnulib_memxfrm: 1,98012
Time for coreutils_memxfrm: 3,42021
In summary, I think that gnulib memxfrm is more performant than coreutils
memxfrm. It is also easier to use: 3 lines of code for gnulib memxfrm vs.
7 lines of code for coreutils memxfrm.
I'd therefore suggest to keep the gnulib one, and that coreutils starts to use
the gnulib one (via a modified xmemxfrm wrapper).
Bruno
compare.tar.gz
Description: application/tgz
- propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils), Paul Eggert, 2010/08/03
- Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils),
Bruno Haible <=
- Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils), Paul Eggert, 2010/08/04
- Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils), Simon Josefsson, 2010/08/04
- Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils), Paolo Bonzini, 2010/08/04
- Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils), Paul Eggert, 2010/08/05
- Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils), Paolo Bonzini, 2010/08/06
- Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils), Paul Eggert, 2010/08/06
- Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils), Simon Josefsson, 2010/08/08
- Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils), Paul Eggert, 2010/08/08
- Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils), Bruno Haible, 2010/08/08
- Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils), Paul Eggert, 2010/08/09