chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] How to search UTF-8 multibyte characters with regex?


From: Chunyang Xu
Subject: Re: [Chicken-users] How to search UTF-8 multibyte characters with regex?
Date: Thu, 09 Nov 2017 16:55:38 +0800
User-agent: mu4e 0.9.18; emacs 27.0.50

Christian Kellermann writes:

> * Chunyang Xu <address@hidden> [171109 05:42]:
>> Hello list,
>> 
>> I'm new to Chicken Scheme. I need to check if a string contains some
>> multibyte characters. In Emacs Lisp, I use:
>> 
>> (string-match "[??????]" "??????")
>>      => nil
>> 
>> (string-match "[??????]" "????????????")
>>      => 2
>> 
>> and it works fine, however, the following Chicken code doesn't:
>> 
>> (irregex-search "[??????]" "??????")
>>      => #<regexp-match (0 submatches)>
>> 
>> I expect it to return #f since "??????" doesn't contain "???" or "???".
>> 
>> Any tips?
>
> Did you load the utf8 egg?

I don't think I did.

> # chicken-install utf8
>
> Then in your code (use utf8).
>
> http://api.call-cc.org/doc/utf8
>
> This includes string-match that is unicode aware.

It raises this error:

  #;1> (use utf8)
  ; loading /usr/local/Cellar/chicken/4.12.0/lib/chicken/8/utf8.import.so ...
  ; loading /usr/local/Cellar/chicken/4.12.0/lib/chicken/8/utf8.so ...
  #;5> (string-match "[一二]" "三四")
  
  Error: (sre-length-ranges) unknown sre: ()
  
        Call history:
  
        <syntax>          (string-match "[一二]" "三四")
        <eval>    (string-match "[一二]" "三四")    <--
  #;14> 

and it works if the regexp doesn't contain multibyte characters

  #;64> (string-match "[abc]" "三四")
  #f
  #;66> (string-match "..[abc]" "三四a")
  ("三四a")

> Kind regards,
>
> Christian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]