chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] Re: unicode and chicken


From: Joerg F. Wittenberger
Subject: [Chicken-users] Re: unicode and chicken
Date: 14 Nov 2002 12:57:13 +0100
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp)

Felix Winkelmann <address@hidden> writes:

> Joerg F. Wittenberger wrote:
> > I'd really like to see/help get something going into that direction.
> 
> > Unicode is of major importance and quite an angument to switch towards
> > Java (argh).
> > I guess a generally unicode based chicken would be a bit
> 
> > slower... What penealty would you expect?
> 
> Too heavy a penalty. Instead I propose the following instead:
> 
> - Extending the character handling to allow 16-bit character
>    codes. Generally the effect of using non-Latin1 characters
>    with standard procedures is undefined.
> 
> - A new library unit `unicode':
> 
>    A new data type `ucs-2-string', which stores a ucs-2
>    representation in native byte order.
> 
>    procedures:
> 
>      ucs-2-string
>      ucs-2-string-append
>      make-ucs-2-string
>      ucs-2-string->list
>      list->ucs-2-string
>      ucs-2-string-ref
>      ucs-2-string-set!
> 
>    read-syntaxes:
> 
>      #utf-8"..."

important, but:

remark: Gauche does single character reding like that and I remember
bigloo beeing similar (?).  Prefer copatible way.

#\u0041   => #\A       ; ASCII letter 'A', specified by UCS
#\u3042   => ; Hiragana letter A, specified by UCS
#\u0002a6b2 => ; JISX0213 Kanji 2-94-86, specified by UCS4

>      #lsb-ucs-2"..."
>      #msb-ucs-2"..."
>      #ucs-2"..."       (native byte order)

these can wait.

I'd propose to keep some compatibility here.  Who adds knowledge about
other Scheme implementations?  Bigloo has:
(from
http://www-sop.inria.fr/mimosa/personnel/Manuel.Serrano/bigloo/doc/bigloo-5.1.html#container1412)

ucs2?  ucs2=?  ucs2<?  ucs2>?  ucs2<=?  ucs2>=?  ucs2-ci=?  ucs2-ci<?
ucs2-ci>?  ucs2-ci<=?  ucs2-ci>=?

ucs2-alphabetic?  ucs2-numeric?  ucs2-whitespace?  ucs2-upper-case?
ucs2-lower-case?

ucs2->integer integer->ucs2


ucs2-string? make-ucs2-string ucs2-string

ucs2-string-length ucs2-string-ref ucs2-string-set!

ucs2-string=?  ucs2-string-ci=?  ucs2-string<?  ucs2-string>?
ucs2-string<=?  ucs2-string>=?  ucs2-string-ci<?  ucs2-string-ci>?
ucs2-string-ci<=?  ucs2-string-ci>=?

subucs2-string ucs2-string-append ucs2-string->list list->ucs2-string
ucs2-string-copy

ucs2-string->utf8-string utf8-string->ucs2-string

>    A parameter `print-unicode-as-utf-8' (defaults to ?)
>    which controls printing (either as ucs-2 literal or as utf-8)

Defaults to UTF-8.  Maybe expose low level routines to print UCS-2
strings to ports like display-utf8 and display-ucs2, which don't look
after the parameter.

> - Adapting SRFI-13 and the (remaining) string-routines in `extras'
>    to handle ucs-2-strings.
> 
> - Adding the ucs-2 character set to SRFI-14
> 
> Any comments are welcome.

I'm with you so far.

Maybe for a start the whole code could be "stolen" from Gauche
(it's also under a BSD license) incorporated and let's see how much of
a penealty it actually is... optimize later.

so short

/Jörg

-- 
The worst of harm may often result from the best of intentions.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]