Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

chicken-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

From:	Tobia Conforto
Subject:	Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Date:	Tue, 18 Mar 2008 18:57:09 +0100

John Cowan wrote:

The difference between restricted and unrestricted strings may notbe as large as the distinction between pairs and fixnums, but it'sthe same *kind* of difference.


I beg to differ.

A pair is no fixnum, and vice-versa.  They're two disjoint domains.

On the other hand, an UTF-8 string is at the same time both a sequenceof Unicode objects and a sequence of bytes, and in many circumstancesit must be treated as both during its life-span (for example usingUnicode-aware operations to compose it and then byte-operations tosplit it into network packets, or to compute an MD5 digest from it,etc.)

This discussion has convinced me that from a *practical* point ofview, it makes a lot of sense to use the same underlying object forboth kinds of operation, instead of copying over the contents everytime you want to switch between the two views (as I suppose it happensfor example in Java, with strings and byte arrays.)

Having the string API operate on UTF-8 characters and having a new APIto operate on bytes, *both on the same underlying string objects*,will let us have the cake and eat it too, at the expense of changingthe meaning of the string API for all existing applications.

The dynamic nature of Scheme suggests that it will all workseamlessly, until someone tries to call a (now Unicode-aware) string-length on a string whose UTF-8 structure had been corrupted with byte-level operations. At which point a runtime error will kindly signalthe situation ;-)



Tobia

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg, (continued)

Prev by Date: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Next by Date: [Chicken-users] shootout benchmark: ring
Previous by thread: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Next by thread: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Index(es):
- Date
- Thread