chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg


From: Tobia Conforto
Subject: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Date: Tue, 18 Mar 2008 18:57:09 +0100

John Cowan wrote:
The difference between restricted and unrestricted strings may not be as large as the distinction between pairs and fixnums, but it's the same *kind* of difference.

I beg to differ.

A pair is no fixnum, and vice-versa.  They're two disjoint domains.

On the other hand, an UTF-8 string is at the same time both a sequence of Unicode objects and a sequence of bytes, and in many circumstances it must be treated as both during its life-span (for example using Unicode-aware operations to compose it and then byte-operations to split it into network packets, or to compute an MD5 digest from it, etc.)

This discussion has convinced me that from a *practical* point of view, it makes a lot of sense to use the same underlying object for both kinds of operation, instead of copying over the contents every time you want to switch between the two views (as I suppose it happens for example in Java, with strings and byte arrays.)

Having the string API operate on UTF-8 characters and having a new API to operate on bytes, *both on the same underlying string objects*, will let us have the cake and eat it too, at the expense of changing the meaning of the string API for all existing applications.

The dynamic nature of Scheme suggests that it will all work seamlessly, until someone tries to call a (now Unicode-aware) string- length on a string whose UTF-8 structure had been corrupted with byte- level operations. At which point a runtime error will kindly signal the situation ;-)


Tobia




reply via email to

[Prev in Thread] Current Thread [Next in Thread]