chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg


From: Graham Fawcett
Subject: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Date: Mon, 17 Mar 2008 23:01:53 -0400

On Mon, Mar 17, 2008 at 10:29 PM, Alex Shinn <address@hidden> wrote:
> >>>>> "Graham" == Graham Fawcett <address@hidden> writes:
>
>     Graham> On Mon, Mar 17, 2008 at 11:22 AM, Kon Lovett <address@hidden> 
> wrote:
>
>     Graham> The Factor language borrowed from Larceny a
>     Graham> clever mechanism for representing Unicode
>     Graham> strings efficiently. Perhaps such a system is
>     Graham> feasible for Chicken, and might eliminate some
>     Graham> of these issues (at the cost of distancing our
>     Graham> string type a bit more from C char arrays):
[snip]
>  This only adds news issues, and solves none of the old ones.
>  The representation itself is interesting, though it may in
>  fact be a pessimisation in many cases (utf8 is about the
>  fastest approach for parsing and regex matching, which are
>  the string operations where speed is the biggest issue to
>  begin with).

Fair enough.

Here's another thought. It seems to me that if we were to represent
strings as composite values, e.g. a two-slot record whose first slot
is an encoding (the symbol 'utf8, or #f for 'byte' encoding), and
whose second slot contains the string data, then the various string
functions could dispatch on the type, and there would be no need to
monkey-patch core string functions to get the desired semantics. A
proper protocol for handling string encodings could be designed, utf8
being one of those encodings.

I don't imagine the dispatch overhead would be significant in any but
the tightest inner loops, in which case one could resort to
fully-specified functions (e.g. byte-string-length or
utf8-string-length).

Graham




reply via email to

[Prev in Thread] Current Thread [Next in Thread]