chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] Basic abnf usage?


From: Matt Gushee
Subject: Re: [Chicken-users] Basic abnf usage?
Date: Wed, 13 May 2015 20:25:57 -0600

Hi, Moritz--

On Thu, Apr 16, 2015 at 2:35 PM, Moritz Heidkamp <address@hidden> wrote:

sorry for the late reply, got busy :-)

And I'm sorry for the even later reply, got scared :-)
No, really! It's stupid, but I am often scared of people's reactions when I make even mildly critical remarks (I hasten to point out that that feeling has nothing to do with any behavior I've observed on this mailing list - just my own neurosis, I guess).

> On Sat, Mar 28, 2015 at 5:33 AM, Moritz Heidkamp <address@hidden
>> wrote:
 
> Maybe in that case it would be good if the API doc said something like:
> "comparse is compatible with UTF-8, but many of the built-in combinators do
> not work with UTF-8 characters, so you may need to construct your own. For
> example: ..."

Sure, we can do that! I didn't mention it so far as this property is
implicit with how CHICKEN core strings work.

I think you are assuming too much background knowledge. Maybe in a perfect world, everybody would get a CS degree, then learn all the fundamentals of Scheme, then master the Chicken core, then start working with extensions and building practical software ... but of course that's often not how it works in reality. And in my opinion - as someone with no formal education in the field, but who came to Scheme with a few years of practical experience in other languages - I think Scheme in general, and any given implementation, has a really steep learning curve. The documentation is good in that almost everything you need to know is covered somewhere, but it can also be really hard to find the information due to the extreme modularity and very bazaar-like culture of Scheme - docs are split up between r*rs and an implementation manual and extension docs written by different people following very different conventions. Additionally - and this is not anyone's fault in particular, more a side-effect of the small community working on Chicken - since the documentation as a whole is not rigorously maintained, contradictions inevitably creep in, and there are often several ways to do the same or similar things, with no indication of which is best or recommended.

Sorry to rant ... I'm rather passionate about documentation. Must be my Python background ;-)
 
It's not quite that simple: Characters may be encoded in many ways,
UTF-8 is far from the only widely used one and not ideal in all cases,
e.g. the algorithmic complexity of some operations on UTF-8 encoded
strings is objectively worse than those on UTF-32 encoded strings. And
it will remain so even till the year 2050. Not guarantees on what
happens after that, though!

I'm certainly aware of different encodings - I started programming when I lived in Japan in the 90s, before Unicode 1.0 was finalized, and there were 3 major encodings in common use just for Japanese. But there is such a thing as 'reasonable defaults' - if you can't support all the encodings, what is more interoperable than UTF-8?
 
> I'm of the opinion (shared by many I18n experts, if I'm not mistaken)
> that a high-level language in the 21st century should have in its core
> a rock-solid character abstraction that is never, ever conflated with
> a byte.

The character abstraction actually is rock-solid even in CHICKEN 4
already: A character object represents a Unicode codepoint in an
encoding independent way.

Okay, but that isn't reflected very well in the core API.

Here's what bugs me about the laissez-faire approach to strings (i.e. "yes, the underlying implementation is Unicode-aware, but you are free to treat strings as sequences of bytes"): I think it's very bad practice from the standpoint of promoting adoption of the language. And *that* matters, IMHO, because programming languages tend to either grow or die (though it appears to me that, unusually, Scheme as a whole is gradually declining, but Chicken is kind of in a holding pattern). And - to simplify a bit - people/companies planning to build real applications choose languages based on their whole ecosystems (core language + tools + platform support + libraries). And if you're concerned about interoperability in a global context, the fact that a language does nothing to enforce the use of one or more interoperable string encodings is surely a big black mark. IOW, you can't trust that any given Chicken extension is Unicode-aware.
 
> There are a lot of things I love about Chicken, but the (IMHO
> obsolete) string implementation is not one of them.

I think I meant API rather than implementation. Oh well.
 
Yeah, strings being equivalent to u8vectors / blobs is a bit messy at
times. I think this is something worth addressing in CHICKEN 5. It would
be a rather invasive change, though, and so far nobody seems inclined to
put in the effort.

>> We could create a comparse-utf8 egg to facilitate this. It's not
>> currently on my agenda but I will put it in my Comparse notes for future
>> reference. If you feel inclined to create one, I'm happy to provide you
>> with code review and feedback!
>
> I was thinking about that.

That'd be great! I would suggest to make it a separate egg so that we
keep the utf8 dependency optional.

That was my thought, too. And now my laptop battery is low, and I need to head home, so I'll stop here. Working on comparse is still on my potential to-do list, but I do have other projects that are higher priorities at the moment.
 
Best regards,
Matt

reply via email to

[Prev in Thread] Current Thread [Next in Thread]