[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-users] How to search UTF-8 multibyte characters with regex?
From: |
Chunyang Xu |
Subject: |
Re: [Chicken-users] How to search UTF-8 multibyte characters with regex? |
Date: |
Thu, 09 Nov 2017 16:55:38 +0800 |
User-agent: |
mu4e 0.9.18; emacs 27.0.50 |
Christian Kellermann writes:
> * Chunyang Xu <address@hidden> [171109 05:42]:
>> Hello list,
>>
>> I'm new to Chicken Scheme. I need to check if a string contains some
>> multibyte characters. In Emacs Lisp, I use:
>>
>> (string-match "[??????]" "??????")
>> => nil
>>
>> (string-match "[??????]" "????????????")
>> => 2
>>
>> and it works fine, however, the following Chicken code doesn't:
>>
>> (irregex-search "[??????]" "??????")
>> => #<regexp-match (0 submatches)>
>>
>> I expect it to return #f since "??????" doesn't contain "???" or "???".
>>
>> Any tips?
>
> Did you load the utf8 egg?
I don't think I did.
> # chicken-install utf8
>
> Then in your code (use utf8).
>
> http://api.call-cc.org/doc/utf8
>
> This includes string-match that is unicode aware.
It raises this error:
#;1> (use utf8)
; loading /usr/local/Cellar/chicken/4.12.0/lib/chicken/8/utf8.import.so ...
; loading /usr/local/Cellar/chicken/4.12.0/lib/chicken/8/utf8.so ...
#;5> (string-match "[一二]" "三四")
Error: (sre-length-ranges) unknown sre: ()
Call history:
<syntax> (string-match "[一二]" "三四")
<eval> (string-match "[一二]" "三四") <--
#;14>
and it works if the regexp doesn't contain multibyte characters
#;64> (string-match "[abc]" "三四")
#f
#;66> (string-match "..[abc]" "三四a")
("三四a")
> Kind regards,
>
> Christian