|
From: | Matt Gushee |
Subject: | Re: [Chicken-users] Basic abnf usage? |
Date: | Wed, 13 May 2015 20:25:57 -0600 |
sorry for the late reply, got busy :-)
> On Sat, Mar 28, 2015 at 5:33 AM, Moritz Heidkamp <address@hidden
>> wrote:
> Maybe in that case it would be good if the API doc said something like:
> "comparse is compatible with UTF-8, but many of the built-in combinators do
> not work with UTF-8 characters, so you may need to construct your own. For
> example: ..."
Sure, we can do that! I didn't mention it so far as this property is
implicit with how CHICKEN core strings work.
It's not quite that simple: Characters may be encoded in many ways,
UTF-8 is far from the only widely used one and not ideal in all cases,
e.g. the algorithmic complexity of some operations on UTF-8 encoded
strings is objectively worse than those on UTF-32 encoded strings. And
it will remain so even till the year 2050. Not guarantees on what
happens after that, though!
> I'm of the opinion (shared by many I18n experts, if I'm not mistaken)
> that a high-level language in the 21st century should have in its core
> a rock-solid character abstraction that is never, ever conflated with
> a byte.
The character abstraction actually is rock-solid even in CHICKEN 4
already: A character object represents a Unicode codepoint in an
encoding independent way.
> There are a lot of things I love about Chicken, but the (IMHO
> obsolete) string implementation is not one of them.
Yeah, strings being equivalent to u8vectors / blobs is a bit messy at
times. I think this is something worth addressing in CHICKEN 5. It would
be a rather invasive change, though, and so far nobody seems inclined to
put in the effort.
>> We could create a comparse-utf8 egg to facilitate this. It's not
>> currently on my agenda but I will put it in my Comparse notes for future
>> reference. If you feel inclined to create one, I'm happy to provide you
>> with code review and feedback!
>
> I was thinking about that.
That'd be great! I would suggest to make it a separate egg so that we
keep the utf8 dependency optional.
[Prev in Thread] | Current Thread | [Next in Thread] |