[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nano-devel] searching for certain UTF8 characters also finds ghosts
From: |
Mark Majeres |
Subject: |
Re: [Nano-devel] searching for certain UTF8 characters also finds ghosts |
Date: |
Sat, 21 Mar 2015 18:08:02 -0700 |
On 3/21/15, Benno Schulenberg <address@hidden> wrote:
>
> On Sat, Mar 21, 2015, at 20:54, Mark Majeres wrote:
>> >> > Strangely this happens only for characters in the range
>> >> > U+00A0 to U+00BF, which baffles me. (I've tested it with
>> >> > ?, ?, ?, ?, ? -- they all get doubly found -- but also
>> >> > with ?, ?, ?, ?, ?, ?, and ? -- they all are found once.)
>> >
>> > Hmm. Your mail is UTF8-encoded, but these characters haven't
>> > come across properly. Did you see them okay?
>>
>> They came in the first time just like they look now :)
>
> How strange. You are sending now via gmail, but maybe you use
> a non-utf8 locale on your machine?
it is possible my other email needs to be configured for utf8. I
funnel it all through gmail
>> Yes, shifting one multi-byte char to the left is a little bit of work.
>> It's not very pretty.
>
> Well, I have a patch for that. Attached. I can't notice any
> performance improvement anywhere, though. It could still
> be made quite a bit faster by looking at the bytes themselves,
> but that requires some thinking and won't look pretty.
How about just providing a better initial starting point?
size_t before = pos-mb_cur_max();
--Mark