[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-users] Re: UTF-8 support
From: |
Alex Shinn |
Subject: |
Re: [Chicken-users] Re: UTF-8 support |
Date: |
Fri, 14 Dec 2007 08:43:35 +0900 |
On Dec 13, 2007 11:32 PM, Tobia Conforto <address@hidden> wrote:
>
> Can you (or anybody else) give an example of different behaviour with
> the option turned on and off? I did a couple of tests and can't see any
> difference, but I admit I have yet to look at the source code.
The only two differences are
1) . matches a full utf-8 character with the option on,
whereas with the option off it would match one byte
of a utf-8 char (thus in Zbigniew's example you get
the two bytes \316 and \273 instead of the λ)
2) character classes treat the characters as utf-8 encoded
with the option on, and as a sequences of bytes with it off
1 is surprisingly rare - you usually use .* or .+, which turn out
to be identical with the option on or off. 2 is only common in
non-English linguistic applications.
--
Alex