[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Pika-dev] string foo
From: |
Tom Lord |
Subject: |
[Pika-dev] string foo |
Date: |
Mon, 17 May 2004 12:38:14 -0700 (PDT) |
I've been reviewing the ustr patches and starting work on them.
Um... hmm... I've basically rewritten ustr.c (see
hackerlab--devo--1.0--patch-69). It was very helpful having it
roughed out but I've reimplemented everything and tweaked it a bit.
* an api note on ustr_length_in_encoding
I changed the return value meaning of ustr_length_in_encoding and
friends. They return -1 for a string truncated in the middle of a
character and -2 for a string that contains characters which can not
be represented in the target encoding.
* an api not on ustr_fill_*
I haven't worked on the file containing the fill functions yet but I
did have to tweak them due to an API change in ustr.c. So, I noticed
the ustr_fill* functions.
Those can't possibly be right. A conceptual `fill' operation could
change the length of a unicode string in several of the encoding
forms. But ustr_* functions can't change the length of their
argument. I think these have to go.
* an api note on the unit of `length' parameters
There is some awkwardness in the very low level routines (the
uni_*_iscan and uni_*_scan routines, for example). They come
in two flavors where the two flavors treat string lengths
differently. For example:
uni_utf16_scan (...)
--- scan one character from a UTF16 string,
incrementing a string index value
The string index is measured in coding values
(16-bit integers)
uni_utf16be_iscan (...)
--- scan one character from a (possibly unalligned)
UTF16be string, incrementing a string index value
The string index is measured in bytes.
The inconsistency actually makes sense in the low-level routines.
The iscan and iput routines are there for when you're dealing with raw
data (as in an I/O buffer) for which allignment isn't assured and most
of the surrounding code is just seeing a stream of bytes. The scan
and put routines are for aligned unicode strings which surrounding
code is more likely to see as an array of coding values.
In the ustr_* functions, I want to:
~ still support explicit-endian encoding forms and
unalligned strings
~ nevertheless, emphasize the string view rather than the
byte view
Accordingly, I'm changing the ustr api so that the *_n forms of
functions accept lengths expressed in coding values rather than bytes.
That creates a slight impedence mismatch between ustr_* and the
low-level uni_* functions --- there's a need to sometimes convert
between byte-lengths and coding value lengths.
* an implementation note
Jao: use less macros. The main goal of the
macrology-underlying-inline-functions in the low-level functions was
to avoid the need for hairy macrology in the higher level functions.
In the rewrite of ustr.c I essentially don't use macros at all.
-t
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Pika-dev] string foo,
Tom Lord <=