chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] Re: UTF-8 support


From: Alex Shinn
Subject: Re: [Chicken-users] Re: UTF-8 support
Date: Fri, 14 Dec 2007 08:43:35 +0900

On Dec 13, 2007 11:32 PM, Tobia Conforto <address@hidden> wrote:
>
> Can you (or anybody else) give an example of different behaviour with
> the option turned on and off?  I did a couple of tests and can't see any
> difference, but I admit I have yet to look at the source code.

The only two differences are

  1) . matches a full utf-8 character with the option on,
      whereas with the option off it would match one byte
      of a utf-8 char (thus in Zbigniew's example you get
      the two bytes \316 and \273 instead of the λ)

  2) character classes treat the characters as utf-8 encoded
      with the option on, and as a sequences of bytes with it off

1 is surprisingly rare - you usually use .* or .+, which turn out
to be identical with the option on or off.  2 is only common in
non-English linguistic applications.

-- 
Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]