[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Pika-dev] so... string work
From: |
Tom Lord |
Subject: |
[Pika-dev] so... string work |
Date: |
Thu, 22 Jan 2004 09:58:50 -0800 (PST) |
So, to sum up the implications for Pika of that long thing:
Strings will be "vtable objects" containing a pointer to a `t_udstr'
(see src/hackerlab/strigns).
`t_udstr's hold strings with an explicitly recorded length (measured
in encoding values -- e.g., bytes for utf-8, int16s for utf-16) and an
explicitly recorded encoding form (e.g. "this is a utf-8 string").
I want pika to use these encodings in such a way that Scheme strings
(mostly) use the narrowest representation they can without needing to
encode a character as more than one encoding unit. So, a 7-bit ASCII
string will usually be in UTF-8 (one byte per character) and most
other strings will be in UTF-16 (one uint16 per character) and some
strings (e.g., those containing characters with buckybits set; those
containing Unicde characters outside the "basic multilingual plane")
will be stored in UTF-32.
So what needs to be done in libhackerlab is:
~ add uni_utf32 to the list of encoding forms
~ extend t_udstr for utf32
~ write test code for t_udstr
~ make sure there's enough in libhackerlab to implement the standard
Scheme string procedures
and in Pika:
~ implement the representation for strings
~ implement the string primitives
~ write some tests
Any of that grab you as something you'd like to work on?
I should warn that the C macrology and inlining foo for adding UTF-32
is a little bit twisted. I find that when I change it takes me a
while to figure out where I put everything :-). But it's not as bad
as it might look at first.
-t
- [Pika-dev] so... string work,
Tom Lord <=