Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

chicken-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

From:	Graham Fawcett
Subject:	Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Date:	Mon, 17 Mar 2008 11:33:08 -0400

On Mon, Mar 17, 2008 at 11:22 AM, Kon Lovett <address@hidden> wrote:
> Summary: I want a byte-string API. I want string integrations. I want
>  global UTF8 strings.

The Factor language borrowed from Larceny a clever mechanism for
representing Unicode strings efficiently. Perhaps such a system is
feasible for Chicken, and might eliminate some of these issues (at the
cost of distancing our string type a bit more from C char arrays):

http://factor-language.blogspot.com/2008_01_01_archive.html

"The new representation is quite clever, and comes from Larceny
Scheme. The idea is that strings are ASCII strings, but have an extra
slot pointing to an 'auxiliary vector'. If no auxiliary vector is set,
the nth character of the string is just the nth byte. If an auxiliary
vector is set, then the nth character has the nth byte as the least
significant 8 bits, and the most significant 13 bits come from the nth
double-byte in the auxiliary vector. Storing a non-ASCII character
into the string creates an auxiliary vector if necessary. This reduces
space usage for ASCII strings, it can represent every Unicode code
point, and for strings with high code points in them, it still uses
less space than the other alternative, UTF-32."

So, a byte string would simply be a string with a null auxilliary vector.

Graham

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg, (continued)

Prev by Date: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Next by Date: Re: [Chicken-users] Proposed egg-post-commit changes.
Previous by thread: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Next by thread: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Index(es):
- Date
- Thread