[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: string-map arg order
From: |
Dirk Herrmann |
Subject: |
Re: string-map arg order |
Date: |
Fri, 7 Sep 2001 00:52:33 +0200 (MEST) |
On 6 Sep 2001, Alex Shinn wrote:
> Dirk> Again: Both string representations can be used to perform
> Dirk> the same kind of tasks. Fixed width encodings can use up
> Dirk> more memory and may need additional effort converting from
> Dirk> and to them (given that the rest of the world uses a
> Dirk> different format). Iteration and a lot of other operations
> Dirk> can, however, be performed in a comparably efficient way.
> Dirk> Variable width encodings can be more space efficient, but
> Dirk> many (and many very common) operations are worse (by a
> Dirk> complexity factor of O(n)) with respect to execution time.
> Dirk> You could, however, save some conversion overhead if the
> Dirk> rest of the world uses the same format.
>
> All true. I'd like to add that conversions are not only forced by
> external considerations, but internal ones (e.g. appending different
> byte-sized strings). Also, performance is not the only concern.
> There's the simplicity of the API (Guile used to have multiple string
> representations, but no one used them), and the ease with which people
> can upgrade. There's also the complexity of the coding involved, and
> the fact that everything using strings (symbols, ports, regexps, etc.)
> will have to be able to handle all forms of strings.
I have been working with guile for some years now, and I have heard that
argument before. The multiple string representation that is cited in
these discussions was never part of guile during the time when I worked
with it. I have never taken the time to look for it in old archives.
But, I claim that such an interface can also be defined in a way that
people use _one_ code for all representations.
Guile is today much cleaner than it was some years ago: We now have
SCM_STRING_CHARS and SCM_SYMBOL_CHARS where before we only had SCM_CHARS,
and in some places even SCM_VELTS had been used for strings (ugh!). We
have SCM_STRING_LENGTH, SCM_SYMBOL_LENGTH, SCM_VECTOR_LENGTH and some
more, which before were all merged into SCM_LENGTH. We now have only one
symbol type, where before we had ssymbols and msymbols. And so on...
Thus, switching to a different representation for most of guile's internal
data types is much easier than before. In this respect I don't want to
base today's design decisions on claims about historical attempts to
implement some other string representation.
Probably the best solution would be to give both approaches a try, measure
the performance and memory implications and then, again, go into a new
round of discussions.
And, btw., there is even a more general solution: Virtualizing the string
interface to allow for _multiple_ string representations using a common
interface. The virtual function table could hold the following entries:
- read character #n, the result would be a scheme character object
- write a given scheme character object into character position #n
This should be sufficient to implement almost everything, but for the sake
of performance, there could certainly be more functions that do longer
operations in one go, like computing a substring, filling a string,
comparing strings, capitalizing, downcasing, upcasing, copying, appending,
converting into a predefined set of standard representations like utf8,
ASCII, Isolatin, ...
Comparing strings, for example, would work like follows:
/* This function is called if s1 is known to be a utf8 string. Nothing
* is known about the string type of s2. It is just known to be a string.
* Generic checks, like whether the two strings have the same length, have
* been performed before the type dispatch. */
SCM
string_equal_utf8_unknown_p (SCM s1, SCM s2)
{
if (SCM_STRING_ENCODING_UTF8_P (s2))
string_equal_utf8_utf8 (s1, s2); /* this should be fast! */
else {
const char *utf_ptr = SCM_STRING_CHARS (s1);
for (i = 0; i != SCM_STRING_LENGTH (s1); ++i) {
/* here, we can access the utf8 characters of s1 with maximum
* performance, but may have to use the virtual character access
* function for each character in s2 */
...
}
}
}
Best regards
Dirk Herrmann
- Re: string-map arg order, (continued)
- Re: string-map arg order, Dirk Herrmann, 2001/09/04
- Re: string-map arg order, Alex Shinn, 2001/09/04
- Re: string-map arg order, Dirk Herrmann, 2001/09/05
- Re: string-map arg order, Alex Shinn, 2001/09/05
- Re: string-map arg order, Dirk Herrmann, 2001/09/06
- Re: string-map arg order, Alex Shinn, 2001/09/06
- Re: string-map arg order, Dirk Herrmann, 2001/09/06
- Re: string-map arg order, Alex Shinn, 2001/09/06
- Re: string-map arg order, Dirk Herrmann, 2001/09/06
- Re: string-map arg order, Alex Shinn, 2001/09/06
- Re: string-map arg order,
Dirk Herrmann <=
- Re: string-map arg order, Marius Vollmer, 2001/09/05
- Re: string-map arg order, Dirk Herrmann, 2001/09/06
- Re: string-map arg order, Marius Vollmer, 2001/09/06
- Re: string-map arg order, Neil Jerram, 2001/09/04
Re: string-map arg order, Gary Houston, 2001/09/04
Re: string-map arg order, Thomas Wawrzinek, 2001/09/05