chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg


From: Graham Fawcett
Subject: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Date: Mon, 17 Mar 2008 11:33:08 -0400

On Mon, Mar 17, 2008 at 11:22 AM, Kon Lovett <address@hidden> wrote:
> Summary: I want a byte-string API. I want string integrations. I want
>  global UTF8 strings.

The Factor language borrowed from Larceny a clever mechanism for
representing Unicode strings efficiently. Perhaps such a system is
feasible for Chicken, and might eliminate some of these issues (at the
cost of distancing our string type a bit more from C char arrays):

http://factor-language.blogspot.com/2008_01_01_archive.html

"The new representation is quite clever, and comes from Larceny
Scheme. The idea is that strings are ASCII strings, but have an extra
slot pointing to an 'auxiliary vector'. If no auxiliary vector is set,
the nth character of the string is just the nth byte. If an auxiliary
vector is set, then the nth character has the nth byte as the least
significant 8 bits, and the most significant 13 bits come from the nth
double-byte in the auxiliary vector. Storing a non-ASCII character
into the string creates an auxiliary vector if necessary. This reduces
space usage for ASCII strings, it can represent every Unicode code
point, and for strings with high code points in them, it still uses
less space than the other alternative, UTF-32."

So, a byte string would simply be a string with a null auxilliary vector.

Graham




reply via email to

[Prev in Thread] Current Thread [Next in Thread]