guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: i18n? unicode?


From: Alex Shinn
Subject: Re: i18n? unicode?
Date: 13 Feb 2002 12:42:02 +0900
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/21.1

>>>>> "Simon" == Simon Josefsson <address@hidden> writes:

    Simon> Is anyone working on Unicode and/or support for various other
    Simon> encodings for guile strings?

    Simon> I guess this would be one major issue that needs to be done
    Simon> before a guile emacs can happen.

Some work has been done off and on, but it's not a simple problem.  One
of the big catches is that Guile wants to both replace Emacs-Lisp and
extend well with C.  For efficient multi-byte strings, Emacs-Lisp has
its own string-representation, and the obvious idea would be to do
likewise (probably using unicode instead of mule), but then you don't
play well with C libraries and have to do conversions everywhere.

Another annoyance is that R5RS pretty clearly treats strings as
character arrays, but many multibyte encodings are not arrays, so
procedures like string-ref and string-set! become slow.

The only Scheme I know of that has decent multibyte support is Gauche,
and that is at the expense of performance on string-ref and the like.
To make up for this it provides string pointers to loop through strings.
A C API for extensions would presumably need to do explicit conversions.

Bigloo has limited ucs2 support, but not really unified - you have to
know what strings you're working with.  Kawa is implemented in Java, so
has as good unicode support as Java.  But then you're tied to Java.

There are some preliminary charset conversion routines at

  http://synthcode.com/gumm/packages/a/ams/guile-charset-0.01.tar.gz

which only does uninteresting 8-bit conversions at the moment.  One
potential idea of this, though, is to implement multi-byte string
handling entirely in Scheme, and redefine basic string/port procedures
using generic methods to handle different string types.  Kind of a hack
(btw, this is how Perl5 does it) but could get people started writing
multi-byte string apps and the upgrade (internal support for different
strings in string procedures) means they won't have to change their
code.

-- 
Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]