chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] UTF-8 support in eggs


From: John Cowan
Subject: Re: [Chicken-users] UTF-8 support in eggs
Date: Mon, 7 Jul 2014 23:59:05 -0400
User-agent: Mutt/1.5.20 (2009-06-14)

Alex Shinn scripsit:

> On Tue, Jul 8, 2014 at 5:58 AM, Mario Domenech Goulart
> <address@hidden> wrote:
> 
> It might help the discussion if we had a list of eggs which
> are known to break on UTF-8 inputs.

Indeed.

> > 1. Have <egg> and <egg>-utf8 variants.  Or, more generally, <egg> and
> >    <egg>-<encoding> variants.  That would turn our coop into a disgusting
> >    mess and would be a nightmare to egg authors.

I don't think the extra generality is required.  If the egg needs to be
able to correctly handle arbitrary characters, UTF-8 is the appropriate
internal representation.  If not, ASCII/Latin-1 is appropriate.  Anything
Eggs that do conversion will need to convert between arbitrary encodings
in byte vectors and UTF-8 strings.  So at worst, some eggs might need
to be split in two.  This is already the case for SRFI 13 and 14.

> The same approaches also apply to eggs needing the full
> numeric tower, though with UTF-8 there's less chance of
> breakage when mixing eggs which do and don't use the utf8 egg.

I would say that UTF-8 has *more* chance of causing undetected
breakage, because UTF-8 strings have an interpretation as core
strings, whereas bignums, ratnums, compnums etc. don't look
like numbers to the core, and errors will be thrown.

-- 
John Cowan          http://www.ccil.org/~cowan        address@hidden
Police in many lands are now complaining that local arrestees are insisting
on having their Miranda rights read to them, just like perps in American TV
cop shows.  When it's explained to them that they are in a different country,
where those rights do not exist, they become outraged.  --Neal Stephenson



reply via email to

[Prev in Thread] Current Thread [Next in Thread]