[Pika-dev] so... string work

pika-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pika-dev] so... string work

From:	Tom Lord
Subject:	[Pika-dev] so... string work
Date:	Thu, 22 Jan 2004 09:58:50 -0800 (PST)


So, to sum up the implications for Pika of that long thing:

Strings will be "vtable objects" containing a pointer to a `t_udstr'
(see src/hackerlab/strigns).

`t_udstr's hold strings with an explicitly recorded length (measured
in encoding values -- e.g., bytes for utf-8, int16s for utf-16) and an
explicitly recorded encoding form (e.g. "this is a utf-8 string").

I want pika to use these encodings in such a way that Scheme strings 
(mostly) use the narrowest representation they can without needing to
encode a character as more than one encoding unit.   So, a 7-bit ASCII
string will usually be in UTF-8 (one byte per character) and most
other strings will be in UTF-16 (one uint16 per character) and some
strings (e.g., those containing characters with buckybits set;  those
containing Unicde characters outside the "basic multilingual plane")
will be stored in UTF-32.

So what needs to be done in libhackerlab is:

~ add uni_utf32 to the list of encoding forms
~ extend t_udstr for utf32
~ write test code for t_udstr
~ make sure there's enough in libhackerlab to implement the standard
  Scheme string procedures

and in Pika:

~ implement the representation for strings
~ implement the string primitives
~ write some tests


Any of that grab you as something you'd like to work on?

I should warn that the C macrology and inlining foo for adding UTF-32
is a little bit twisted.   I find that when I change it takes me a
while to figure out where I put everything :-).    But it's not as bad
as it might look at first.

-t

[Prev in Thread]

Current Thread

[Next in Thread]

[Pika-dev] so... string work, Tom Lord <=
- [Pika-dev] Re: so... string work, Jose A. Ortega Ruiz, 2004/01/23
  - [Pika-dev] Re: so... string work, Tom Lord, 2004/01/23
    - [Pika-dev] Re: so... string work, Jose A Ortega Ruiz, 2004/01/23
    - Re: [Pika-dev] Re: so... string work, Matthew Dempsky, 2004/01/23
- Re: [Pika-dev] so... string work, Matthew Dempsky, 2004/01/24
  - Re: [Pika-dev] so... string work, Tom Lord, 2004/01/24
    - Re: [Pika-dev] so... string work, Tom Lord, 2004/01/24
    - Re: [Pika-dev] so... string work, Matthew Dempsky, 2004/01/24
    - Re: [Pika-dev] so... string work, Tom Lord, 2004/01/24
    - Re: [Pika-dev] so... string work, Matthew Dempsky, 2004/01/24

Prev by Date: [Pika-dev] strings draft
Next by Date: [Pika-dev] Re: so... string work
Previous by thread: [Pika-dev] strings draft
Next by thread: [Pika-dev] Re: so... string work
Index(es):
- Date
- Thread